REMARKS 

Applicant respectfully requests reconsideration of the application. 



I. Real party in interest 

The real party in interest is the assignee of this application, ATI International SRL, a 
Society with Restricted Liability chartered in Barbados, West Indies. ATI International is a 
corporate affiliate of ATI Technologies, Inc. of Toronto, Ontario, Canada. 

II. Related Appeals and Interferences 

Applicant is unaware of any related appeals or interferences. 

III. The status of the claims 

Kindly amend the claims as indicated in Exhibit 2. A complete clean copy of all claims 
involved in the request for reconsideration (as amended during the course of this application), 
and including those now added by amendment, is attached as Exhibit 1 to this paper. 

Claims 1-35, 39-48, and 50-80 are pending in the application. Of these, claims 1, 2, 10, 
14, 19, 24, 25, 30, 39, 50, 61 and 70 are independent. Claims 1-35, 39-48, 50-52 and 54-57 are 
nominally rejected (though as noted below, the Office Action is inadequate to raise any prima 
facie rejection in the manner required by Chapter 2100 of the MPEP, and thus no rejections 
exist). Claims 36-38 and 49 are cancelled. Claim 53 was omitted inadvertently in the 
amendment that added claim 54, and is now added by amendment. No rejection of claim 58 was 
raised in the Action of March 7, 2002. Claims 59-80 are added by amendment. 

The claims for which reconsideration is sought are 1-35, 39-48, and 50-57. Initial 
consideration is requested for claims 58-80. 

IV. Status of amendments 

Because the application is not under final rejection, amendments proposed in this paper 
may be entered as of right. 
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The Office Action requests that the first paragraph of the specification be updated to 
reflect changes in status of the parent applications. None of the parent applications listed on 
page 1 of the specification have changed status. 

A number of claims are amended purely to assist the Examiner in examining the claims. 
These amendments are not made in response to any statutory rejection, and do not narrow the 
claims. For example, a number of claims formerly recited "an emulated architecture" and are 
now amended to recite "the instruction's native architecture." 

A number of claims are amended to recite that one action is based "at least in part" on 
some condition. To the extent that this affects the scope of the claims at all, it broadens them. 
These amendments are not addressed to any statutory ground of rejection. 

V. Telephone interviews of March 7, 2002 and June 26, 2002 

Applicant thanks Examiner Eng and Supervisory Examiner Sheikh for an extensive 
telephone interview of June 26, 2002 and a brief interview on or about March 7, 2002. The 
following issues were discussed, with agreements as indicated. 

A. Brief interview of March 7, 2002 

During the last week of February and first week of March, 2002, Applicant left four voice 
mails for the Examiner and Supervisory Examiner indicating that an interview might be helpful 
in clearing up any remaining issues of claim scope. 

On or about March 7, 2002, Applicant reached the Examiner for a live but brief telephone 
call. In this brief interview, the Examiner indicated that he understood the claims well enough to 
examine them and to apply a new reference, and that no extensive interview would be necessary 
to advance prosecution. 

B. Understanding of the claims 

Much of the interview of June 26, 2002 was taken up with a survey of the technology. A 
paraphrase of this overview appears at section VI, at page 10, of this paper. 

C. Statements of utility and enablement 

In the interview, the Examiner requested indications of statements of utility in the 
specification. One example appears at page 32, lines 1 8-20 of the specification: 
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First, the Tapestry machine exposes both the native RISC instruction set and the 
X86 instruction set, so that a single program can be coded in both, with freedom 
to call back and forth between the two. . . . Second, an X86 program may be 
translated into native RISC code, so that X86 programs can exploit many more of 
the speed opportunities available in a RISC instruction set. This second approach 
is enabled by profiler 400, prober 600, binary 'translator, and certain features of 
the memory manager (see sections V through VIII, infra). 

and another at page 30, lines 20-30: 

Profiler 400 records details of the execution flow of the X86 program. . . . Hot 
spot detector 122 analyzes the profile to find "hot spots," portions of the program 
that are frequently executed. When a hot spot is detected, a binary translator 124 
translates the X86 instructions of the hot spot into optimized native Tapestry 
code, called "TAXi code." During emulation of the X86 program, prober 600 
monitors the program flow for execution of X86 instructions that have been 
translated into native code. When prober 600 detects that translated native 
Tapestry code exists corresponding to the X86 code about to be executed, and 
some additional correctness predicates are satisfied, prober 600 redirects the IP 
[instruction pointer] to fetch instructions from the translated native code instead 
of from the X86 code. Probing is discussed in greater detail in [section VI of the 
specification, pages 100-1 16]. 

As discussed in more detail at section IX.D.l .c at page 42, below, these statements are 
entitled to a presumption of correctness. Any further rejection or requirement relating to these 
issues must be accompanied by the explanation mandated by MPEP §§ 2164.04, 2164.05, and 



D. Suggested amendments to the claims 

The Examiner suggested that the word "likelihood" be changed, suggesting that the 
claims could be phrased in a manner more idiomatic to the ordinary use in the art. The result of 
this portion of the conversation is discussed in section DCC.14.C of this paper, at page 39. 

The Examiner also suggested several other claim amendments, but in the process of 
discussing these proposed amendments, it became clear that none was required for novelty, non- 
obviousness, definiteness or enablement. It was agreed that claims need not "teach" the 
invention, and that no further rejections would be raised based on such grounds. 

Applicant further notes that most of the Examiner's suggestions appear in claims already 
pending. For example, the idea that two given instructions are drawn from the "same program" 
is inherent in claim 1 1 - the input and output of a binary translator, in contexts where execution 
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is transferred between them, constitute "the same program." Cases where "an alternate coding of 
instructions'* does exist are recited in claims 16, 17 and 51. 



E. Consideration of the Geppert reference 

Agreemelit"was7eached thaf ffie~Geppert reference, submitted in three prior Information 
Disclosure Statements, would now be considered pursuant to MPEP § 2133.03(b)(IV)(B): 

B. Nonprior Art Publications Can Be Used as Evidence ... 

Abstracts identifying a product's vendor containing information useful to 
potential buyers, . . . along with the date of product release or installation before 
the inventor's critical date may provide sufficient evidence of prior sale by a third 
party to support a rejection based on 35 U.S.C. 102(b) or 103. In re Epstein, 32 

F. 3d 1559, 31 USPQ2d 1817 (Fed. Cir. 1994) (Examiner's rejection was based on 
nonprior art published abstracts which disclosed software products meeting the 
claims. The abstracts specified software release dates and dates of first 
installation which were more than 1 year before applicant's filing date.). 

See also GFICorp. v. Franklin Corp., 265 F.3d 1268, 1274, 60 USPQ2d 1141, 1143-44 (Fed. 

Cir. 2001) (" Materiality is not limited to prior art but instead embraces any information that a 

reasonable examiner would be substantially likely to consider important in deciding whether to 

allow an application to issue as a patent.") (italic in original, underline added); MPEP § 2128. 

A fourth IDS and 1449, and another copy of the Geppert reference, are now enclosed. 

Applicant requests a checked-off copy of the 1449. 

F. "Misleading arguments" 

It was agreed (subject to conferring with other Office personnel) that there would be 
some clarification of the record, to reassure future readers of this prosecution history that no 
arguments of the Response filed August 10, 2001 were misleading, and that the assertion at page 
3, line 18 of the Action of December 2001 is withdrawn. Applicant suggests the following 
language for inclusion in the Examiner's next paper: 

The Examiner wishes to clarify the record, and acknowledges that 
Applicant's arguments in the Response filed August 10, 2001 were in no way 
"misleading" or otherwise improper. The accusation of "misleading argument" 
made in the Office Action of December 2001 is withdrawn. 
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VI. Summary of the invention 

The legal scope of the invention is set forth in the claims; certain specific embodiments 
are described in the specification. Among other purposes, and without prejudice to the scope of 

- the claims, the invention 1 may be useful to enable a single computer to execute.two different, „ 

instruction set architectures (ISA's), for example, an "old" ISA (e.g., the Intel X86 instruction 
set) and a "new" ISA (e.g., a RISC instruction set). A single program may mix-and-match 
routines coded in one ISA with routines coded in the other. The routines may freely call back 
and forth between ISA's. The following describes the overall contextual setting for the 
technology, so that the specific inventions of the claims may be more easily understood. 

One desirable feature of a reliable two-ISA computer is that the original program text and 
the control data structures of programs should not be altered by the execution process. For 
example, if an X86 program uses self-modifying code, any alteration of the X86 instruction text 
by an emulator might conflict with the self-modification, and would cause the emulation to 
deviate from the behavior observed if the program were executed on a native X86 computer. 
Thus, the technique of the Morley '982 patent, discussed in the Office Action of April 2001, 
might be inoperable in such cases. Similarly, if the X86 segment descriptors are managed by an 
unmodified X86 program, such as Microsoft Windows, an approach similar to the Richter '684 
reference, which modifies system control data structures, may be inoperable, as discussed in 
section IX.B.5 at page 23 below. 

Turning to the specification and Figs, la and lb, Tapestry is fast RISC processor 100, 
with hardware and software features that (in addition to the RISC instruction set provided in 
native mode) provides a correct implementation of an Intel X86-family processor. ("X86" refers 
to the family including the 8086, 80186, ... 80486, Pentium, and Pentium Pro.) Tapestry 
processor 100 fetches (stage 110) instructions from instruction cache (I-cache) 112, or from 
memory 118, from a location specified by IP (instruction pointer, generally known as the PC or 



1 Much of the discussion of the "invention" herein is directed to the entire system disclosed in the 
application. The term "invention" is used in this paper in its informal sense, and is not intended to be a 
substitute for the legal definition of the invention set out in the claims. For example, the advantages and 
results are discussed here only to help establish context for examination, not to further define the legal 
"invention" of the claims. 
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program counter in other machines) 114, with virtual-to-physical address translation provided by 
I-TLB (instruction translation look-aside buffer) 116. The instructions fetched from I-cache 112 
are executed by a RISC execution pipeline 120. In addition to the services provided by a 
conventional I-TLB, I-TLB 116 stores several bits 182, 186 that chQOse^an instruction _ . - 
environment in which to interpret the fetched instruction bytes. One bit 182 selects an 
instruction set architecture (ISA) for the instructions on a memory page. Thus, the Tapestry 
hardware can readily execute either native instructions or the instructions of the Intel X86 ISA. 
This feature is discussed in more detail in section II of the specification, at pages 43-45. 

The execution of a program encoded in the X86 IS A is typically slower than execution of 
the same program that has been compiled into the native Tapestry ISA. Profiler 400 records 
details of the execution flow of the X86 program. Profiling is discussed in greater detail in 
section V of the specification, pages 72-100. Hot spot detector 122 analyzes the profile to find 
"hot spots," portions of the program that are frequently executed. When a hot spot is detected, a 
binary translator 124 translates the X86 instructions of the hot spot into optimized native 
Tapestry code, called "TAXi code." The correspondence between X86 code and translated 
native Tapestry code is maintained in PIPM (Physical Instruction Pointer Map) 602. During 
emulation of the X86 program, prober 600 monitors the program flow for execution of X86 
instructions that have been translated into native code. When prober 600 detects that translated 
native Tapestry code exists corresponding to the X86 code about to be executed, and some 
additional correctness predicates are satisfied, prober 600 redirects the IP to fetch instructions 
from the translated native code instead of from the X86 code. That is, as the X86 program is 
executed, and control reaches a portion of the X86 program that has been translated, the 
computer will automatically recognize that a RISC alternative coding exists, and will transfer 
control to the version translated into the RISC ISA. Probing is discussed in greater detail in 
section VI of the specification, at pages 100-1 16. 

The Tapestry machine exposes both the native RISC instruction set and the X86 
instruction set, so that a single program can be coded in both, with freedom to call back and forth 
between the two. This approach is enabled by ISA bit 180, 182 control on converter 136, or in 
an alternative embodiment, by ISA bit 180, 182, calling convention bit 200, semantic context 
record 206, and the corresponding exception handlers (see section IV of the specification, pages 
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63-72). Second, an X86 program (or portions thereof) may be translated into native RISC code, 
so that X86 programs can exploit many more of the speed opportunities available in a RISC 
instruction set. This second approach is enabled by profiler 400, prober 600, binary translator, 
and certain features of the memory manager -(seersectipns V through VIII of the specification, 
pages 72-1 16) Third, these two approaches cooperate to provide an additional level of benefit. 

Some aspects of the invention relate to several different "side tables" that are used to 
implement these features. 

In one particular embodiment that is the subject of several of the claims, a side table 
(PFAT 172) has entries 174 that correspond to regions of X86 instructions. The probe bits 624 
of PFAT entries 174 give_an approximate indication, a likelihood estimate, of whether there is a 
RISC translated code segment for any of the code in the region corresponding to the PFAT 
entry. 2 Another side table (PIPM 602) is consulted when the PFAT table access indicates that 
existence of translated code is likely. The PIPM then gives the final definitive answer of 
whether execution should be transferred to RISC code, and where the relevant RISC code is 
located in memory. This embodiment is the focus of Figs. 6a-6c and section VI (pages 100- 
116). 

In another example embodiment, ISA bits 180, 182, 194 each correspond to a region of 
memory, telling whether instructions in the region are to be interpreted in the X86 ISA or in the 
RISC ISA. When execution flows from a region whose ISA bit indicates one ISA to a region 
indicating the other, the computer reconfigures itself so that execution may resume in the new 
ISA, and so that the result of the mixed-ISA computation will be the same as the result that 
would be obtained if the program were coded in a single ISA. (In one embodiment, ISA bits 180 
are stored in the same physical PFAT 172 as the probe bits 624. In other embodiments, these 
two conceptually-distinct sets of bits could be stored in distinct storage structures. For example, 
copies 182 of the ISA bits are cached in the instruction translation look-aside buffer (I-TLB) 116, 
and may be stored with or separately from the probe bits 624). This embodiment is the focus of 
Figs. 3a-3o and section II of the specification (pages 44-46). 



2 In some embodiments, the ISA bits 180 and probe bits 624 may be stored together in a single 
table. In others, they might be stored separately. 
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A third embodiment includes a "calling convention" (CC) bit 196, 200. In any computer, 
data that are to be passed from one routine of a program to another routine are passed according 
to a "calling convention," an agreement among software components for how data are to be 
passed from one component-to the next. A_giyen system's calling convention is typically defined 
by the designers of the computer system or by designers of compilers for the system. The CC bit 
196, 200 of PFAT 172 and I-TLB 116, is disclosed in Figs. 2a-2c and section IV (pages 64-73). 

The TAXi system uses interrupts (i) to seize control of the execution of a program, (ii) to 
transfer to control from the X86 code to the translated RISC code, (iii) to alter machine state so 
that coding assumptions embodied in one code segment are rendered consistent with coding 
assumptions of a segment to which control is transferred, and then (iv) to transfer control back to 
the appropriate point in the X86 code at .the end of the translated segment. The TAXi system 
handles conventional interrupt-driven functions like page faults or other synchronous or 
asynchronous execution interrupts. In addition, the TAXi system adds a number of 
unconventional functions like changing the ISA in which the next instructions will be executed, 
and many of these functions are performed by interrupts. Some interrupts are generated by 
instructions like TRAP, that are architecturally-defined to generate interrupts. In the TAXi 
system, some interrupts are generated on simple instructions like ADD's or JUMP'S that are not 
architecturally-defined to generate an interrupt. For example, if a translated hot spot starts with a 
simple integer ADD instruction (the definition in the X86 architecture of an integer ADD 
instruction does not call for an interrupt to be raised), the side tables are set to values that cause 
this normally-interrupt-free instruction to raise an interrupt, so that the TAXi system can gain 
control and transfer execution over to the translated RISC code. Some control transfer 
instructions that are architecturally-defined to transfer control to one address are altered, so that 
control is transferred to a destination other than the architecturally-defined destination. 

Many of the claims of this application relate to techniques for (i) recognizing when X86 
execution has reached a hot spot for which a translation exists, when the instruction text itself 
and control data structures should not be changed, (ii) recognizing when the hardware should 
switch modes from executing in X86 mode to RISC mode, or vice-versa, and then 
(iii) transferring control back to the X86 code at the point corresponding to the end of the 
translated hot spot, all without altering the original X86 program text. 
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Many of the claims of this application are directed to computer design techniques that 
have broad applicability beyond the specific context described above. Thus, unless a claim 
specifically recites "two distinct ISA's/' "emulation of a non-native ISA," "raising an interrupt," 
etc. it should be understood that the,claim is not so limited. The discussion in this preliminary- - 
section is merely to provide context for understanding the technology, not to state the scope of 
the claims. Therefore, it would not be helpful to examine the application based on the above 
description; the focus of examination should remain on the claims themselves. 

VII. Issues presented for reconsideration 

The following issues are presented for reconsideration: 

a. whether any enablement rejection under 35 U.S.C. § 112% 1 has been raised^ against 
claims 1-35, 39-48, and 50-57. 

b. whether any vagueness rejection under 35 U.S.C. § 1 12 % 2 has been raised against 
claims 1-35, 39-48, and 50-57. 

c. whether any obviousness rejection under 35 U.S.C. § 103 has been raised against 
claims 1-35, 39-48, and 50-57 based on U.S. Patent No. 5,481 ,684 to Richter. 

VIII. Grouping of claims 

The claims are grouped in a number of groups according to individual issues. The groups 
are defined, and stand or fall together or separately, as discussed in section IX, below. 

IX. Argument 

A. Preliminary statement 

Applicant assures the Examiner - the claims mean what they say . With only one 
exception ("data manipulation behavior," see section IX.C.4, below), the claims are drafted using 
only conventional terms of art and other recognized idioms, in their commonly-used sense. 
Applicant requests that the Examiner read the claims carefully, and examine the claims as they 
are stated. The Examiner should carefully avoid paraphrasing the claims (it appears that several 
of the "rejections" in the Action of March 2002 are based purely on misquotation of the claims, 
and not on any defect in the claims themselves), or otherwise straying from the exact words of 
the claims. Surprise at the unconventional features of the claims should not be turned into 
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rejections for vagueness or non-enablement. Similarly, the surprising combination of elements 
in the claims suggest non-obviousness, not obviousness. . 

B. Obviousness 

Claims 1-52 ami 54-57 are nominally rejected under 35 U.S~C. §103 over U.S. Patent No. 
5,481,684 to Richter. 

1. Group I: claims 1-9, 17, 23, 34, 35, 41, 50-56, and 78-80 

The Action of March 2002 purports to reject claim 2 over the Richter '684 patent. Claim 
2 may be considered as a representative claim of a group that includes claims 1-9, 17, 23, 34, 35, 
41, 50-56, and 78-80 (Group I): if claim 2 stands, the rest of Group I stands with claim 2. If 
. claim 2 falls, then the other claims must be considered separately to the extent discussed 
elsewhere in this Response. Claim 2 recites as follows: 

2. A method, comprising the steps of: 

as part of the basic instruction cycle of executing an instruction of a non- 
supervisor mode program executing on a computer, consulting a table, the table 
having entries that are indexed by the address within an address space of 
instructions executed, entries of the table containing attributes of instructions 
whose addresses index to the respective table entries; and 

controlling an architecturally- visible data manipulation behavior or control 
transfer behavior of the instruction based at least in part on a content of a table 
entry indexed by the address of the instruction. 

a. First ground of traverse: Richter '684 does not teach a table 
with entries "indexed by an address within an address space" 

Claim 2 recites a table "indexed by the address within an address space" of instructions 
fetched from the address space. . 

At best, Richter '684 shows circuitry that alters the behavior of instructions based on the 
address space in which instructions are located, not an address " within the address space." 3 



The references to the claim and to Richter '684 are bolded to assure the Examiner that Applicant 
is not "merely pointing out what the claims require," but is instead "identifying a difference between the 
references and the claims and explaining how the claims are patentable over the prior art." 

These contrasts have been drawn in Applicant's previous papers, though the remarks in the Office 
Action suggest that these contrasts may have been overlooked. 
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To understand this distinction, it is helpful to recollect that when a program executes on 
an Intel X86, a program references several distinct segments. Each segment is a separate address 
space, each of which is defined by a segment register. An Intel X86 program typically has one 
or mpre_code segments, . one or more data segments, and a stack segment. (See InteLmanual, vol. ~ 
1, Exhibit 5, pages 3-7 to 3-10; Intel manual, vol. 3, Exhibit 7, pages 3-3 to 3-24, 3-33 to 3-34 
and 4-12 to 4-14.) Each of these segments defines a different address space. Thus, address 0 in 
a first code segment may be a different memory location than address 0 in a second code 
segment, and each may be different than address 0 in any of the data segments, which may all be 
different than address 0 in the stack segment. 4 This is a familiar notion - in essentially all 
computers made since the Intel 80286 and Motorola 68010 were introduced in about 1985 (and 
all large computers since about 1970), different-programs have different address spaces - so that 
a write by program A into its address 20, for example, does not conflict with the data that 
program B has stored at its address 20. Intel takes this one step further, so that the program text, 
the data and the stack for a single program may be stored in different address spaces - a store 
into the stack segment is protected from data stored in the code and data segments. In the Intel 
X86 architecture, each memory reference must designate a particular segment. For example, 
most branch instructions implicitly refer to addresses in the current code and data segment. 
PUSH and POP instructions implicitly refer to the current stack segment. Most MOVE 
instructions implicitly refer to the current data segment. A few instructions explicitly designate a 
particular segment. 

In light of that technological background, the distinction between the "table indexed by 
an address within an address space" of the claim and the segment-register based scheme of 
Richter '684 becomes clear. Richter '684 stores a bit combination in an Intel-like segment 
descriptor (col. 8, line 65 to col. 9, line 2; col. 9, lines 28-31; col. 10, lines 35-44) that designates 
whether an instruction is to be interpreted in the Intel X86 ISA or in the IBM/Motorola PowerPC 
ISA. Richter's bit combination is effective for the entire segment, that is, for all addresses 
within the address space designated by the descriptor. Richter '684 provides no way to specify 



It may also be the same memory locations, in some cases - two or more segments may map to 
the same origin in the linear address space. However, Richter does not discuss this special case, and thus 
it is not inherent in Richter. 
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that two different addresses 'Vithin an address space" are to "index" to two different table 
entries, as recited in the claim. 

Because claim 2 recites a limitation absent from Richter '684, claim 2 is not obvious over 
Richter'684. _ .. - . 

b. Second ground of traverse: the written Office Action is 
inadequate to raise a prima facie rejection 

There is no obviousness rejection of these claims, because the Office Action fails to make 
the minimum showings required for a rejection to exist. 

Ever since it was initially filed, claim 2 has recited "a table indexed by an address of an 
instruction." This language has been distinctly pointed out in each of Applicant's three previous 
papers, and contrasted tothe prior art (Response of February 4, 2002 at page 4-5; Response of 
August 10, 2001 at page 23, lines 23-25; Response of October 10, 2000 at page 11, lines 8-9). 
Two prior art references have been withdrawn based on this language. This language is not 
questioned under § 1 12 - there appears to be no possible explanation for the omission. 

MPEP § 2143.03 reads as follows (italic in original, underline added): 

2143.03 All Claim Limitations Must Be Taught or Suggested 

To establish prima facie obviousness of a claimed invention, all the claim 
limitations must be taught or suggested by the prior art . In re Royka, 490 F.2d 
981, 180 USPQ 580 (CCPA 1974). " All words in a claim must be considered in 
judging the patentability of that claim against the prior art." In re Wilson, 424 
F.2d 1382, 1385, 165 USPQ 494, 496 (CCPA 1970) 

Because claim 2 recites a limitation that is not considered by the § 103 portion of the Office 

Action, no § 103 rejection exists. 5 Because no rejection has been raised, no amendment is made 

in response to any § 103 rejection. 

c. All claims of Group I stand with claim 2 

For reasons discussed in section IX.B.l.b, there is no § 103 rejection of claim 2, or of any 
other claim of Group I. As discussed in section IX.B. 1 .a, any attempt to reject claim 2 over 
Richter '684 would fail on the merits. The claims of Group I are not rejected, and are allowable. 



5 Because no obviousness rejection exists, any rejection in the next Office Action will either be 
insufficient to be a rejection at all, or will be a*"new ground of rejection" - in either case, no such future 
rejection can be made final. 
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2. Group II: claims 1, 7, 12, 14-18, 24-29, 42, 57, 60, 66 and 73 

The Action of March 2002 purports to reject claim 14 over Richter '684. Claim 14 may 
be considered as a representative claim of a group that includes claims 1, 7, 12, 14-18, 24-29, 42, 
57, 60, 66 and 73_(Group II): if claimj 4 stands, the rest of Group II stands with claim 14. If 
claim 14 falls, then these other claims must be considered separately to the extent discussed 
elsewhere in this Response. 

Claim 14 recites as follows: 

14. A microprocessor chip, comprising: 
instruction pipeline circuitry; 
address translation circuitry; and 

a lookup structure having entries associated with corresponding address 
- ranges generated by the instruction pipeline circuitry and translated by the address ~" 
translation circuitry, the entries describing a likelihood of the existence of an 
alternate coding of instructions located in the respective corresponding address 
range. 

a. First ground of traverse: Richter '684 does not teach "a 
likelihood of the existence of an alternate coding of 
instructions" 

Claim 14 recites a lookup structure with an entry that describes "a likelihood of the 
existence of an alternate coding of instructions." 

In contrast, Richter '684 teaches nothing remotely analogous. At best, the portions of 
Richter '684 indicated by the Office Action teach that different pieces of a program can be 
coded in two different ISA's. £.g., col. 2, lines 38-40; col. 5, lines 55-67. The indicated portions 
of Richter '684 indicated in the Office Action never indicate that a single program segment 
might exist in two alternate codings. Without an "alternate coding," a table that indicates "a 
likelihood of the existence of an alternate coding" (as recited in claim 14) cannot possibly be 
suggested by Richter '684. 

b. Second ground of traverse: the written Office Action is 
inadequate to raise a prima facie rejection 

There is no obviousness rejection of these claims, because the Office Action fails to make 

the minimum showings required for a rejection to exist. 
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Even in its initially filed form, claim 14 recited a lookup structure whose entries 
"describe a likelihood of the existence of an alternate coding of instructions." In the past, this 
language has been indicated to be comprehensible - note that there was no objection to this 
language in the Action of December 2001, and claim 14 has been allowed over two other - * 
references based on this language. All three of Applicant's prior papers have drawn specific 
attention to this language. Applicant suggests that it is improper for this language to now be 
disregarded in a fourth Office Action. 

MPEP § 2143.03 reads as follows (bold and italic in original, citations omitted, underline 

added): 

2143.03 All Claim Limitations Must Be Taught or Suggested 

To establish prima facie obviousness of a claimed invention, all the claim limitations 
must be taught or suggested by the prior art. All words in a claim must be considered in 
judging the patentability of that claim against the prior art. . .. 

INDEFINITE LIMITATIONS MUST BE CONSIDERED 

A claim limitation which is considered indefinite cannot be disregarded. If a claim is 
subject to more than one interpretation, at least one of which would render the claim 
unpatentable over the prior art, the examiner should reject the claim as indefinite under 
35 U.S.C. 1 12, second paragraph (see MPEP § 706.03(d)) and should reject the claim 
over the prior art based on the interpretation of the claim that renders the prior art 
applicable. ... 

LIMITATIONS WHICH DO NOT FIND SUPPORT IN THE ORIGINAL 
SPECIFICATION MUST BE CONSIDERED 

When evaluating claims for obviousness under 35 U.S.C. 103, all the limitations of the 
claims must be considered and given weight, including limitations which do not find 

support in the specification as originally filed (i.e., new matter) [It] was error to 

disregard [claim limitations that did not appear in the specification as filed] when 
determining whether the claimed invention would have been obvious in view of the prior 
art. 

In contrast, the Office Action states (page 6, lines 17-19): 

For the reasons set forth in the section 112 rejections above, no statement can be 
made as to whether the Richter references meets the functional languages of the 
claims . . . 

The Office Action is in irreconcilable conflict with the MPEP. MPEP § 2143.03 is 
unambiguous - it is examiner error to disregard claim limitations merely because the claim 
language is subject to a rejection under § 1 12 1 or 2. If a claim limitation is thought indefinite 
or unsupported, MPEP § 2143.03 requires an examiner to make a good faith "best guess" as to 
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the broadest reasonable "interpretation of the claim," and compare the prior art against that "best 
guess." It is impermissible to disregard the claim limitation entirely, as the Examiner confesses 
to have done here. 

Similarly, MPEP § 2143.03 makes no exception for "functional" language in an 
obviousness rejection. Such distinctions were made some decades ago in apparatus claims, but 
the law has changed, and no longer allows different treatment of "structural" and "functional" 
language (except in the context of § 1 12 f 6 limitations - not an issue here). Further, there has 
never been any proper basis to do what has been done here - disregard "functional" limitations in 
method claims. 

Applicant is unaware of any PTO regulation that gives an examiner the authority to 
ignore an instruction of the Director and Commissioner ordered through the MPEP. Applicant is 
similarly unaware of any subsequent Order or Notice that would supersede MPEP § 2143.03. If 
the Examiner is aware of either, he is requested to provide a copy. Unless the Examiner can 
supply such, Applicant suggests that compliance with MPEP § 2143.03 would be in order. 

Finally, it is well-established that where an agency employee acts in "brazen defiance" of 
agency regulations, that employee's action has no legal existence. Mayor and City Council of 
Baltimore v. Mathews, 562 F.2d 914, 920 (4th Cir. 1977); see also Certain Former CSA 
Employees v. Dept. of Health and Human Services, 762 F.2d 978, 984 (Fed. Cir. 1985) (action in 
violation of agency's own regulation is " illegal and of no effect"). Similarly, the absence of 
required findings is fatal to the validity of the Office Action, regardless of whether there may be 
evidence in the record to support proper findings. Anglo-Canadian Shipping Co. v Federal 
Maritime Com., 310 F.2d 606, 617 (9th Cir. 1962). In view of MPEP § 2143.03, a written 
rejection that omits claim limitations has no legal existence. 

3. Group III: Claims 1, 18-23, 30-34, 43, 58, 59, 67 and 78 

The Action of March 2002 purports to reject claim 19 over the Richter '684 patent. 
Claim 19 may be considered as a representative claim of a group that includes claims 1, 18-23, 
30-34, 43, 58, 59, 67 and 78 (Group III): if claim 19 stands, the rest of Group III stands with 
claim 19. If claim 19 falls, then these other claims must be considered separately to the extent 
discussed elsewhere in this Response. 
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Claim 19 recites as follows: 

19. A microprocessor chip, comprising: 
instruction pipeline circuitry; and 

interrupt circuitry cooperatively designed with the instruction pipeline 
circuitry to trigger a synchronous inteirupf ^ of an instruction of a 

process based at least in part on a memory state of the computer and the address 
of the instruction, wherein the architectural definition of the instruction in the 
instruction's native architecture does not call for an interrupt. 



Claim 19 recites "interrupt circuitry ... to trigger an interrupt on execution of an 
instruction ... wherein the architectural definition of the instruction in the instruction's native 
architecture does not call for an interrupt." 

The portions of Richter '684 indicated by the Office Action disclose exactly the opposite 
of the claim. For example, Richter '684 states that he sets certain bits in a segment descriptor to 
"an invalid or reserved combination of bits" (col. 8, lines 65-66). Richter '684 states that this 
setting "could cause a prior-art x86 system to perform an undocumented function" (col. 9, lines 
4-6), and that Richter's system gains control when "an unknown opcode is detected by 
instruction decode" (col. 12, lines 4-5). The architectural definition of the X86 calls for an 
interrupt when the processor executes an "undocumented function" or "unknown opcode." (See 
Intel manual, vol. 1, Exhibit 5, page 4-12; Intel manual, vol. 2, Exhibit 6, pages A-5 n.l and A-7 
n.l; Intel manual, vol. 3, Exhibit 7, page 17-5.) Thus, these portions of Richter '684 cannot 
meet a claim limitation "wherein the architectural definition of the instruction . . . does not call 
for an interrupt." 

Because claim 19 recites a limitation that is absent from Richter '684, claim 19 is non- 
obvious over Richter '684. 

b. Second ground of traverse: the Office Action fails to consider 
each limitation of the claims, in violation of MPEP § 2143.03 

For reasons discussed in section IX.B.2.b, the Office Action fails to state an obviousness 

rejection of any claim in Group III. For example, the language "[triggering] an interrupt on 

execution of an instruction ... the architectural definition of the instruction not calling for an 
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interrupt" has been part of claim 19 since its original filing, and has been pointed out in each of 
Applicant's papers, yet never considered in the § 103 portion of any Office Action. 

Because no Office Action has made any attempt to reject any claim of Group III over the 
prior art, no such rejectioruhas never existed. 

4. Group IV: Claims 1, 10-13, 26, 35, 39-48, 68 and 74 

The Action of March 2002 purports to reject claim 10 over Richter '684. Claim 10 may 
be considered as a representative of a group that includes claims 1, 10-13, 26, 35, 39-48, 68 and 
74 (Group IV): if claim 10 stands, the rest of Group IV stands with claim 10. If claim 10 falls, 
then these other claims must be considered separately to the extent discussed elsewhere in this 
Response. 

Claim 10 recites as follows: 

10. A microprocessor chip, comprising: 
instruction pipeline circuitry; 

table lookup circuitry designed to index into a table by a memory address 
of a memory reference arising during execution of an architecturally-defined 
instruction, and to retrieve a table entry corresponding to the address, the table 
entry being distinct from the memory referenced by the memory reference; 

the instruction pipeline circuitry being responsive to the contents of the 
table entry to alter a manipulation of data or control transfer behavior of the 
instruction in a manner incompatible with the architectural definition of the 
instruction in the instruction's native architecture. 

a. First ground of traverse: Richter '684 does not teach altering 
any instruction's behavior "in a manner incompatible with the 
architectural definition of the instruction" 

Claim 10 recites "[altering] a . . . behavior of the instruction in a manner incompatible 
with the architectural definition of the instruction." 

In contrast, Richter '684 teaches exactly the opposite. Richter '684 teaches switching 
ISA's so that every instruction is executed in a manner to improve compatibility of the execution 
of each instruction with the architectural definition of the instruction in the instruction's own 
architecture. See, e.g., col. 2, lines 34-43; col. 5, lines 54-67; col. 10, lines 34-45. 
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b. Second ground of traverse: the Office Action fails to consider 
each limitation of the claims, as required by MPEP § 2143.03 

For reasons discussed in section EX.B.2.b, the Office Action fails to state an obviousness 

rejection of any claim in Group IV. For example, since its original filing, claim 10 has recited 

"[indexing] into a table by a memory address of a memory reference arising during execution of 

an instruction" or similar language. This limitation has never been considered oyer the art in any 

Office Action. 

Because no Office Action has made any attempt to reject any claim of Group IV over the 
prior art, no such rejection has never existed. 

5. Group V: Claims 1, 53, and 59-77 

Claims 59-77 are added to claim a new aspect of the invention, related to the aspect 
claimed in Group I (claims 2 and 50). Claim 70 may be considered as a representative claim of a 
group that includes claims 1, 53, and 59-77 (Group V). If claim 70 stands, the rest of Group V 
stands with claim 70. If claim 70 falls, then these other claims must be considered separately to 
the extent discussed elsewhere in this Response. 

Claim 70 recites as follows:. 

70. An apparatus, comprising: 
instruction pipeline circuitry; and 

table lookup circuitry designed to retrieve a table entry from a table whose 
entries are indexed by an address of an instruction fetched for execution, the table 
being stored in storage that is architecturally invisible to programs in the fetched 
instruction's native architecture; 

the instruction pipeline circuitry being responsive to a content of the table 
entry to control an architecturally- visible data manipulation behavior or control 
transfer behavior of the fetched instruction based at least in part on a content of 
the table entry associated with the address of the fetched instruction.. 

Claim 70 recites a table that is stored "in storage that is architecturally invisible to 

programs." 

In contrast, Richter's "instruction set type bit 21" is stored in the x86 segment 
descriptors and segment registers (col. 10, lines 34-37). The segment descriptors and segment 
registers are architecturally visible in the x86 architecture. (See Intel manual, vol. 1, Exhibit 5, 
pages 3-7 to 3-9; Richter '684, col. 9, lines 4-5). 
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Because claim 70 recites a limitation that is absent from Richter '684, claim 70 is non- 
obvious. 

6. Dependent claims 

The pre-existing dependent claims, 3-9, 1 1-13, 15-18, 20-23, 26-29, 31-~35, 40-48, 51, 52, 
and 54, are purportedly rejected over the art. However, the written Office Action does not 



address the limitations recited in these claims. Such piecemeal examination is discouraged by 
37 C.F.R. § 1.105 and MPEP § 707.07(g). In particular, the following claim limitations are not 
discussed in the § 1 03 section of the Office Action: 



transfer of execution control to a second instruction for execution 


claims 3, 44 


transfer of execution control to an instruction coded in an instruction set 
architecture (ISA) different than the ISA of the executed instruction 


claim 4 


entries of the table correspond to pages managed by a virtual memory manager' 


claims 8, 27, 48, 52,-55 


the table entries are indexed by a physical address 


claim 56 


circuitry for locating an entry of the table is integrated with virtual memory 
address translation circuitry of the computer 


claims 8, 27 


triggering an interrupt on execution of an instruction . . . based at least in part on 
... the address of the instruction 


claims 9, 12, 18, 29,43 


a binary translator 


claim 1 1 


pipeline control circuitry ... designed to initiate a determination of whether to 
transfer control from an execution of the instruction ... to the second binary 
representation 


claim 1 1 


returning control to an instruction flow of the process other than the instruction 
flow triggering the interrupt 


claims 13, 20,31 


the table entry is an entry of a translation look-aside buffer 


claim 1 5 


two different instruction flows that are logically equivalent to each other 


claims 22, 33 


consulting a second table, the entries of the second table definitively indicating 
entry points for initiating such alternate codings as exist 


claim 57 



(Many of these limitations are not discussed in the § 1 12 ^ 1 or H 2 context, so even under the 
view of the law espoused in the Office Action and discussed in section IX.B.2.b, the failure to 
discuss these claims is inexplicable.) Because there has been no attempt to comply with the 
minimum requirements for raising an obviousness rejection of these claims, no such rejection 
exists. 



C. Indefiniteness 

1. Legal infirmity of any "indefiniteness" rejections 

All "indefiniteness" rejections raised in this action are suspect. First, very nearly every 
issue has already been raised and resolved earlier in prosecution. Second, nearly every term used 
in the claims is a well-established term of art, used in its conventional sense - there is no 
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"indefiniteness" in describing the invention in these terms. Third, the "indefiniteness" portions 
of the Action attempt to apply rules that simply do not exist. 

a. Most of the "indefiniteness" concerns have been raised and 
resolved earlier in prosecution 

Very nearly all of the nominal rejections raised in this Office Action under § 112^2 have 
been previously raised and resolved to the Examiner's satisfaction. For example, the Office 
Action of April 10, 2001 queried about the phrase "architecturally- visible data manipulation 
behavior," and Applicant provided an explanation in the Response of August 10, 2001, at page 
15 (see section IX.C.4 at page 28, below, for an amplification of this explanation). This 
explanation was accepted by the Examiner, as indicated by the fact that the query was dropped 
from the Office Action of December 4,-2001": ------ 

Nonetheless, in order to demonstrate a good faith effort to advance prosecution, 
Applicant will go well beyond the minimum legal requirements for establishing patentability, 
and offer some explanatory assistance for claim language that has not been discussed earlier in 
prosecution. It should be understood that this explanation is provided as a convenient concrete 
frame of reference to assist examination of the application, and does not limit the claims to only 
the embodiments described here. 



A number of the purported § 112^2 rejections state that the basis for the rejection is 
because "The Examiner is unable to find their definitions in the specification." These claims 
have been drafted in reliance on MPEP §2111.01, which permits an applicant to not to define 
claim terms in the specification, and instead rely on the common definition in the art. MPEP 
§ 21 1 1.01 instructs the Examiner that "When not defined by applicant in the specification, the 
words of a claim must be given their plain meaning. In other words, they must be read as they 
would be interpreted by those of ordinary skill in the art." 

Most of the § 1 12 1 and 2 issues raised in the Office Action relate to established terms 
of art, used in the claims in their conventional senses. Because neither § 1 12 % 2 nor the MPEP 
provide basis for rejecting established terms of art under § 1 12 U 2, there are no "rejections" at 
all. However, in an effort to usefully advance prosecution, these claim terms are discussed 
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below. If these issues are raised again, the Examiner is requested to identify a provision of the 
MPEP that creates some exception to § 21 1 1 .01. The Examiner is requested not to raise 
rejections or requirements that are not authorized. 

(ii) "Support for" or "meaningful operation" 

Several of the purported § 1 12 U 2 rejections use phrases such as "support for" or 
"meaningful operation" that might be relevant under §112^1, but are inapplicable to any 
known ground of rejection arising under § 1 12 ^ 2. MPEP § 2174 makes clear that % 1 and f 2 
impose distinct requirements, and that it is improper to pose a rejection under paragraph two 
using reasoning that is only applicable in the context of paragraph one. In Carl Zeiss Stiftung v. 
RenishawPLC, 945 F.2d 1173, 1180-81, 20 USPQ2d 1094, 1100 (Fed. Cir. 1991), the Federal 
Circuit held that there is no requirement that a claim recite every component required for an 
"operable" device. These nominal "rejections" cannot be considered "rejections" at all. 

2. "Each entry describing a likelihood of the existence of an alternate 
coding of instructions" 

The Office Action begins as follows: 

With respect to all independent claims, the recitation "each entry 
describing a likelihood of the existence of an alternate coding of instructions" is 
vague and indefinite. It is not clear whether there is or there is no alternate coding 
of instructions in the system for causing the pipeline to behave differently. Note 
that the definition of "likelihood" is "probability" in Webster's New Collegiate 
Dictionary. The probability of existence (instead of "true" or "false") would not 
result in two outcomes for designating two different behaviors. Note further that 
a pipeline which is a digital circuit is able to respond to only precise instructions 
and not probability. 

This paragraph raises a significant number of subsidiary issues that can best be handled one-by- 
one. These subsidiary issues are discussed in the following sections, and then the remaining 
issues in the paragraph are considered in section IX.C.14 at page 37, below. 

3. "Architectural definition of an instruction" 

The Office Action states: 
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The scope of the meaning of the following is not clear: 

1 . "the architectural definition of the instructions". . . The Examiner is 
unable to find their definitions in the specification. 



This issue has been raised and resolved earlier in prosecution 



This issue appears to have been raised in error. An almost identical issue was raised in 
the Office Action of April 10, 2001, and the following response was provided in the Response of 
August 1 0, 200 1 , at pages 1 5 and 1 7: 

K. "alter — in manner incompatible with the architectural definition of the 
instruction" 

The Examiner requested information as to the meaning of "alter — in 
manner incompatible with the architectural definition of the instruction" as used 
in claim 10. 

As is well known in the art, an architecture defines the behavior of each 
instruction in its instruction set. For example, most architectures define an 
"ADD" instruction to cause the addition of two numbers, a "JUMP" instruction to 
transfer control to another instruction in accordance with defined rules, and 
similar definitions for all other instructions in the instruction set. . . . 

Applicant believes that the plain language of the clause would be will 
understood by those in the art. 

This was apparently accepted as a sufficient explanation, because the issue was dropped from the 
Office Action of December 4, 200 1 . 



The phrase "architectural definition of an instruction" is an established term of art. For 
example, the concept appeared in undergraduate textbooks nearly 20 years ago (Tanenbaum, 
Exhibit 3, pages 181-82) and has been used many times since (Hennessey & Patterson, Exhibit 4, 
pages 89-92; Intel manual, vol. 2, Exhibit 6, pages A-l to A-7). An "architectural definition of 
an instruction" is the description of an instruction's behavior in the relevant computer's 
architectural definition. There is no need to provide a definition of established terms of art in the 
specification. 
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4. "an architecturally-visible data manipulation behavior" 

The Office Action states: 



The scope of the meaning of the following is not clear: 

1 . ... "an architecturally- visible data manipulation behavior" . . . 
Examiner is unable to find their definitions in the specification. 



The 



a, 



This issue has been raised and resolved earlier in prosecution 



This issue appears to have been raised in error. An almost identical issue was raised in 
the Office Action of April 10 5 2001, and the following response was provided in the Response of 
August 1 0, 200 1 , at page 15: 

F. "architecturally-visible behavior" 

The Examiner requested information as to the meaning of "an 
architecturally- visible data manipulation behavior or control transfer behavior of 
an instruction" as used in claim 1. 

"Architecturally-visible data manipulation behaviors or control transfer 
behaviors of an instruction" are discussed in the documents incorporated by 
reference into the specification at pages 138-139. As is well-known in the art, 
"architecturally visible behaviors" are behaviors that must be preserved across all 
implementations of an architecture. For example, the bit sequence "0000 0100" 
means "add an 8-bit immediate to the A register" in all implementations of the 
x86 architecture, from the 8086 in the mid-1970's to the Pentium III. On the 
other hand, pipeline hazard controls, the internal sequencing of a repeated string 
instruction, and the management of hardware resources during intermediate 
pipeline stages are not architecturally visible in most architectures. Most 
architecturally- visible results persist across instruction boundaries, most non- 
architecturally- visible results do not. 6 



6 "Architecturally visible" is a well-established term of art, as demonstrated by the industry 
papers and university course notes, at Exhibit 8. 

Another "rule of thumb" for distinguishing "architecturally-invisible" and "architecturally- 
visible" behaviors is that an element is generally considered "architecturally invisible" if no program can 
be written that can tell the difference between two different states of the element, and "architecturally 
visible" if a program can tell the difference. For example, a memory location that cannot be addressed, 
and that has no effect on the execution of any instruction, is "architecturally invisible." As another 
example, the difference between a hardware TLB fill and a software TLB fill is "architecturally invisible" 
at the application level, and "architecturally visible" to the operating system. See Tanenbaum, Exhibit 3, 
p. 181. The quality of an emulation system is reflected, at least partially, in the degree to which the 
emulation system itself is architecturally invisible to the emulated program. In any particular context, 
engineers in the art have no difficulty distinguishing "architecturally visible" and "architecturally 
invisible" components or behaviors from each other, and continually rely on clear and definite (though 
context-dependent) notions of architectural visibility and invisibility in designing computers. 

Response to Office Action of March 7, 2002 28 5231.1 6-4004C 09/429,094 



9210792.3 




Some instructions have "architecturally- visible data manipulation 
behavior" - for example, an ADD instruction calls for an addition of data, and all 
implementations of the architecture must achieve the same result. Other simple 
data manipulations include subtraction, multiplication, division, negation, logical 
and, logical or, logical exclusive or, complement, shift, bit field insert or extract, 
most format conversions, and most floating-point arithmetic operations, etc. 
("Data manipulation" behavior can be contrasted to "data movement" behavior of 
a memory load or store) 

Applicant believes that these concepts are well known in the art and that 
no amendments are necessary to address the issues raised by the Examiner. 

Some examples of controlling architecturally-visible instruction behavior 
based on table entries are discussed in section VI. A (page 100-101 of the 
specification), section VLB (pages 101-103), and section VLD (pages 106-111). 
The specification may include other examples. 

This was apparently accepted as a sufficient explanation, because the issue was dropped from the 

Office Action of December 4, 200 1 . 

Additional examples of "controlling an architecturally- visible data manipulation behavior 
or control transfer behavior" appear at Fig. 6c, specification at pages 1 10; at section II (pages 44- 
46), and section VLE (pages 111-112). 

Section IV of the specification (pages 64-73) discusses an embodiment in which a side 
table may alter the behavior of a routine data-manipulation instruction, an instruction that 
generates, as a primary output, a bit pattern that was not present in any of its inputs. Examples 
include add, subtract, logical or, logical and, floating-point-to-integer convert, or similar data- 
manipulation instructions. In section IV, either the ISA or calling convention bit may cause such 
an instruction to move data from a block of registers to memory or from memory into the register 
block, or may cause the hardware to convert from processing the X86 ISA to a RISC ISA. 

5. "control transfer behavior" 

The Office Action states: 

The scope of the meaning of the following is not clear: 
1 . ... "control transfer behavior" in all the independent claims. The 
Examiner is unable to find their definitions in the specification. 
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a. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. An almost identical issue was raised in 
the Office Action of April 10, 2001, and the following response was provided in the Response of 
August 10, 2001, at page 15: 

An instruction may have a "architecturally- visible control transfer 
behavior" - for example, a JUMP instruction calls for a control transfer to a 
particular program location, and all implementations of the architecture must 
achieve the same result. 

This was apparently accepted as a sufficient explanation, because the issue was dropped from the 
Office Action of December 4, 200 1 . 

b. § 112 f 2 rejection of a well-established term of art is 
unwarranted 

"Control transfer" is an established term of art, long known in undergraduate curricula. 
See, for example, Hennessey & Patterson, Exhibit 4, pages 103-109; Intel manual, vol. 2, Exhibit 
6, pages 3-245 to 3-251; and the exhibits included with the Response of October 2000. No § 1 1 2 
f 2 rejection is warranted. 

6. "wherein the architectural definition of the instruction in an emulated 
architecture does not call for an interrupt" 

The Office Action states: 

The scope of the meaning of the following is not clear: 

2. "wherein the architectural definition of the instruction in an emulated 
architecture does not call for an interrupt" of claims 19 and 30. The Examiner is 
unable to find the explanation of the wherein clause in the specification. 

a. An incorrect legal test is applied 

Section 1 12 ^ 2 requires only that the claims be clear and definite, not that the 
specification elaborate every claim limitation. Because an incorrect legal test has been applied, 
no rejection exists. 
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b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. An almost identical issue was raised in 
the Office Action of April 10, 2001, and the following response was provided in the Response of 
August 10, 2001, at pages 14 and 21: 

E. "architectural definition not calling for an interrupt" 

The Examiner requested information as to the meaning of "the 
architectural definition of the instruction not calling for an interrupt" as used in 
claim 1 . 

An instruction "wherein the architectural definition of the instruction ... 
does not call for an interrupt" is an instruction that the architecture defines as 
executing without raising an interrupt. For example, in most computers, the 
definition of an integer add of two integer registers, where one register contains 
"2" and the other contains "3," does not call for an interrupt. On the other hand, 
as is well-known in the art, the definition of an SVC or TRAP instruction calls for 
an interrupt. In most architectures, a divide instruction, in cases where the divisor 
is zero, calls for an interrupt. Most architectures define that an access to an 
undefined memory location calls for an interrupt. Applicant believes that this 
concept is well known to those skilled in the art. 

Some examples of an instruction "wherein the architectural definition of 
the instruction . . . does not call for an interrupt" are discussed in section VI.A 
(page 100-101 of the specification), section VLB (pages 101-103), many of the 
probeabfe events 610 in Fig. 4b, and many of the X86 transfer of control 
instructions discussed in section VI.D (pages 106-1 1 1). 

This was apparently accepted as a sufficient explanation, because the issue was dropped from the 
Office Action of December 4,2001. 

7. "altering a manipulation of data or transfer of control behavior of the 
instruction in a manner incompatible with the architectural definition 
in an emulated architecture of the instruction" 

The Office Action states as follows: 

The scope of the meaning of the following is not clear: 

3. "altering a manipulation of data or transfer of control behavior of the 
instruction in a manner incompatible with the architectural definition in an 
emulated architecture of the instruction" of claim 39. The Examiner is unable to 
find the support or explanation of the clause. 
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a. No rejection is raised - the "rejected" language does not 
appear in claim 39 

This language does not appear in claim 39. No rejection is raised. 

b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. An almost identical issue was raised in 
the Office Action of April 10 5 2001, and the following response was provided in the Response of 
August 10, 2001, at pages 15 and 17: 

K. "alter — in manner incompatible with the architectural definition of the 
instruction" 

The Examiner requested information as to the meaning of "alter — in 
manner incompatible with the architectural definition of the instruction" as used 
in claim 10. 

As is well known in the art, an architecture defines the behavior of each 
instruction in its instruction set. For example, most architectures define an 
"ADD" instruction to cause the addition of two numbers, a "JUMP" instruction to 
transfer control to another instruction in accordance with defined rules, and 
similar definitions for all other instructions in the instruction set. To "alter [an 
instruction's behavior] in a manner incompatible with the architectural definition 
of the instruction" is to cause an instruction to do something other than its 
architecturally-defined behavior. For example, an altered "ADD" instruction 
might perform a subtract, cause a transfer of control, or be an illegal instruction. 
An altered "JUMP" instruction might cause a control transfer to a destination 
other than the architecturally-defined destination, and/or cause a change in ISA 
under which further instructions are executed. Some examples are described in 
the specification at sections II (pages 44-46), IV (pages 64-73), VI, A (pages 100- 
101), VI.D (pages 106-1 1 1) and VI.E (pages 111-112). The specification may 
include other examples. 

Applicant believes that the plain language of the clause would be [well] 
understood by those in the art. 

This was apparently accepted as a sufficient explanation, because the issue was dropped from the 

Office Action of December 4, 200 1 . 

"Architectural definition of an instruction" is discussed in section IX.C.3.b at page 27 of 
this paper. "Data manipulation behavior" is explained at DC.C.4 at page 28 of this paper. 
"Transfer of control behavior" is discussed at section IX.C.5 at page 29. 

Additional support in the specification may be found. One embodiment, using the probe 
bits 624 of PFAT 172 and I-TLB 1 16, is disclosed in Figs. 6a-6c and section VI (pages 100-1 16). 
A second embodiment, the CC bit 196, 200 of PFAT 172 and I-TLB 1 16, is disclosed in Figs. 
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2a-2c and section IV (pages 64-73). A third embodiment, the ISA bit 180, 182, 194 of PFAT 
172 and I-TLB 1 16, is disclosed in Figs. 3a-3o and section II (pages 44-46). 



8. "the architectural definition of the instruction with which the 
_ — — alteration is incompatible is a definition in an emulated architecture" 

The Office Action states as follows: 

The scope of the meaning of the following is not clear: 

4. "the architectural definition of the instruction with which the alteration 
is incompatible is a definition in an emulated architecture" in claims 36-38. 

"Architectural definition of an instruction" is discussed in section IX.C3.b at page 27 of 
this paper. See section IX.C.7 at page 31 for a discussion of an "alteration" an instruction's 
behavior. 

Nonetheless, purely to assist in examination of the application (and not in response to any 
statutory rejection - none exists), Applicant has amended certain claims from "architectural 
definition of an instruction in an emulated architecture" to the "architectural definition of an 
instruction in the instruction's native architecture." Either phrase uses only terms of art in their 
conventional sense. For example, when a RISC machine emulates an X86 ADD instruction, the 
phrase refers to the definition of the ADD instruction in the X86 architecture, as opposed to the 
definition of an ADD instruction in the RISC architecture.^ Further, this amendment does not 
narrow the claims. 

This phrase is composed of established terms of art, and thus no § 1 12 % 2 rejection is 
warranted. 

9. "logically equivalent" 

The Office Action questions the use of the phrase "logically equivalent." 



7 It should be noted that this use of "native" is somewhat different than the use of "native" used 
in the specification, where "native" refers to the Tapestry RISC machine. However, because this use of 
"native" is always qualified by "the instruction's native architecture" or some similar phrase, the 
difference should always be clear in context. 
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a. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. An almost identical issue was raised in 
the Office Action of April 10, 2001, and the following response was provided in the Response of 
August, iQ, 20_01,at pag?sJ£-20: _ . ... . . 

P. "logically equivalent" 

The Examiner requested information as to the meaning of "equivalent" in 
claim 22.... 

[A]t page 100, lines 10-17, the specification discusses one possibility for 
"logically equivalent" instruction text: RISC code, for example, produced by a 
binary translator, that performs similarly enough to the original X86 program that 
the two programs produce the same result. 

This was apparently accepted as a sufficient explanation, because the issue was dropped from the 
Office Action of December 4, 200 1 . 

10. "What actually the instruction pipeline circuitry does" 

The Office Action queries, "With respect to claim 10, it is not clear what actually the 
instruction pipeline circuitry does." 

a. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. A similar issue was raised in the Office 
Action of April 2001 . Applicant's paper of August 10, 2001 responded as follows, at pages 25- 
26: 

E. "pipeline circuitry to effect control of instruction behavior" 

In the Office Action, the Examiner requested information as follows: 

Applicants are requested to identify the following components in the 
drawings and the description thereof in the specification: . . . 

5. the description of the pipeline circuitry to effect control of an 
architecturally- visible data manipulation behavior and control transfer 
behavior. 

The Examiner is referred to sections III.F [reproduced at section IX.C.4 at page 
28 of this paper] and IILK [reproduced at section IX.C.7 at page 31 of this paper] 
of these Remarks. One specific embodiment is described in Figs. 6b and 6c, and 
discussed in section VI.D (pages 106-1 1 1) of the specification. Other examples 
are discussed in section VI. A (pages 100-101), and throughout section VI (pages 
100-116). 

This explanation was accepted in the Office Action of December 2001. Re-raising of a resolved 
issue appears to have been an oversight by the Examiner. 
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b. The claim itself recites "what actually the instruction pipeline 
circuitry does" 

In pertinent part, claim 10 recites as follows (emphasis added): 
10. A microprocessor chip, comprising: 



the instruction pipeline circuitry being responsive to the contents of the 
table entry to alter a manipulation of data or control transfer behavior of the 
instruction in a manner incompatible with the architectural definition of the 
instruction in the instruction's native architecture . 

Claim 10 itself recites what the "instruction pipeline circuitry does:" it responds "to the contents 

of the table entry to alter a manipulation of data or control transfer behavior of the instruction in 

a manner incompatible with the architectural definition of the instruction in the instruction's 

native architecture." The instruction pipeline circuitry also performs the traditional functions of 

"instruction pipeline circuitry," as that term is understood in the art. 

In any particular "microprocessor chip," it will be clear whether or not the chip includes 
"instruction pipeline circuitry" that responds to table contents as recited in the claim (in which 
case the embodiment meets the claim), or does not have such "instruction pipeline circuitry" (in 
which case it does not meet the claim). Section 1 12 Tf 2 asks no more. 

Unless it can be demonstrated that it might be ambiguous whether or not a particular 
computer does or does not include "instruction pipeline circuitry" as recited in the claim, a 
rejection under § 1 12 % 2 is unwarranted. 



§ 1 12 If 2 contains no requirement that a claim recite function for every structural 
element. Because an erroneous legal test has been applied, no rejection exists. 

11. "Function of the lookup structure" 

The Office Action queries, "Claim 14 fails to recite function of the lookup structure." In 
pertinent part, Claim 14 recites as follows (emphasis added): 



Response to Office Action of March 7, 2002 35 5231.16-4004C 09/429,094 



instruction pipeline circuitry; 



c. 



An incorrect legal test has been applied 



9210792.3 




14. A microprocessor chip, comprising: 

a lookup structure having entries . . . describing a likelihood of the 
existence of an alternate coding of instructions located in the respective 
corresponding address range . 

The "function of the lookup structure" is to "describ[e] a likelihood of the existence of an 

alternate coding of instructions located in the . . . address range[s]" corresponding to the 

respective entries. 

12. "Functional relationship between the circuits and the lookup 
structure such that meaningful operation can be achieved" 

The Office Action queries, "Claim 14 further fails to recite functional relationship 
between the circuits and the lookup structure such that meaningful operation can be achieved." 

a. Claim 14 itself recites a "functional relationship between the 
circuits and the lookup structure" 

In pertinent part, claim 14 recites as follows (emphasis added): 

14. A microprocessor chip, comprising: 
instruction pipeline circuitry; 

a lookup structure having entries associated with corresponding address 
ranges generated by the instruction pipeline circuitry and translated by the address 
translation circuitry . ... 

The functional relationship is that the entries of the lookup structure are "associated with 

corresponding address ranges generated by the instruction pipeline circuitry and translated by the 

address translation circuitry." 

b. No rejection can be raised on the stated grounds 

"Functional relationship can be achieved" is, at best, a consideration under the "how to 
use" requirement of § 112^1 or utility requirement under §101. There is no such thing as a 
rejection half based on § 112 H 1 and half based on § 1 12 U 2 or § 101. MPEP § 2174 
specifically forbids the "mix and match" approach to rejecting claims that is exemplified here: 
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2174 Relationship Between the Requirements of the First and Second 
Paragraphs of 35 U.S.C 112 

The requirements of the first and second paragraphs of 35 U.S.C. 1 12 are separate 
and distinct. ... 

Applicant respectfully requests -that no future rejections be raised on grounds that are not 
authorized by the MPEP. 

13. "First table and second table" of claim 57 

The Office Action states as follows: 

Applicants are requested to identify the first table and the second table of 
claim 57 in the drawings and the description in the specification. 

On its face, this is a request for information, not a rejection of any claim. 

This question was answered in the Response of August 10, 2001, pages 10-11: 

The specification discusses several different "side tables" that are used to 
implement these inventions. ... A second side table (PFAT 1 72,1 74 with its 
probe bits 624) has entries that each correspond to regions of X86 instructions. 
The probe bits 624 of a single PFAT entry give an approximate indication, a 
likelihood estimate, of whether there is a translated code segment for any of the 
code in the region corresponding to the PFAT entry. A third side table (PIPM 
602) is consulted when the PFAT table access indicates that existence of 
translated code is likely. The PIPM then gives the final definitive answer of 
whether execution should be transferred to RISC code, and where the relevant 
RISC code is located in memory. 

Further discussion of the "first table" of claim 57 and its probe bits 624 of PFAT 172 and 
the I-TLB are discussed in section IX.C.14, below. 

Embodiments of the "second table" of claim 57 include PIPM 602, shown in Figs, la and 

6a-6c. 

14. "Each entry describing a likelihood of the existence of an alternate 
coding of instructions" 

The Office Action rejects a table whose "[entries describe] a likelihood of the existence 

of an alternate coding of instructions." 

a. This issue has been raised and resolved earlier in prosecution 

First, this issue appears to have been raised in error. The Response of February 4, 2002 
noted as follows, at page 6: 
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Section VI of the specification, "Statistical probing," discusses the PFAT 
(page frame attribute table) 172. Entries of PFAT 172 correspond to memory 
pages. Each PFAT table entry includes five bits 624 (see Fig. 6b) that indicate, in 
an approximate, statistical way, the "likelihood of the existence of an alternate 
coding of instructions" on the corresponding page (e.g., section VLB (pages 101- 
1 03), section VLC (pages 1 03- 1 06), section VI.D (pages 1 06-1 1 1 ). PIPM 602 is 
used to resolve the uncertainty remaining after consulting the statistical 
information stored in bits 624 of PFAT 172. 

This was clearly a satisfactory explanation, because the rejection over the Morley 6 982 reference 

was withdrawn based on it. This, in combination with the Examiner's assurance in the interview 

of March 7, 2002 that all claim terms were sufficiently understood to examine the claims, 

suggests that a rejection for "indefiniteness" was not intended to be raised. 

b. The Office Action fails to raise a legally-cognizable 
indefiniteness rejection 

Second, the claims are drafted in reliance on MPEP § 2173.04, which specifically 

cautions that "Breadth Is Not Indefiniteness." A claim need not recite a limitation when the 

claim is intended to cover all possibilities for that limitation. These claims are intended to cover 

both cases where the "alternate coding" exists, and cases where it does not, and the claims are 

drafted accordingly. The question asked in the Office Action is not properly raised in a § 1 12 U 2 

context. 

Third, the Office Action does not raise any genuine issue of indefiniteniess. For example, 
the Action does not suggest that one of ordinary skill would have any difficulty in determining 
whether or not a particular table does or does not meet the claims. This claim language is clear 
enough to determine that neither Morley '982 nor Adachi '975 (in combination with the other 
references cited in prior Office Actions) have such a table. Language that is understood is rarely 
indefinite. 

Fourth, the Office Action appears to overlook the rule, repeated often throughout Chapter 
2100 of the MPEP (emphasis added): 

2111 Claim Interpretation; Broadest Reasonable Interpretation 

CLAIMS MUST BE GIVEN THEIR BROADEST REASONABLE 
INTERPRETATION 

During patent examination, the pending claims must be "given the broadest 
reasonable interpretation consistent with the specification ." . . . 
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2164.08 Enablement Commensurate in Scope With the Claims 

All questions of enablement are evaluated against the claimed subject matter. . . . 

When analyzing the enabled scope of a claim, . . . claims are to be given their 
broadest reasonable interpretation that is consistent with the specification . 

In replacing the word "likelihood" with the word "probability," it appears that this requirement 
may have been neglected. 

For these four reasons, no rejection exists. 

c. The factual premises of any "rejection" are incorrect 

Further, any rejection based on the language of "a likelihood of the existence of an 
alternate coding of instructions" is traversed for two further reasons. 

Fifth, the statement that "a pipeline which is a digital circuit is able to respond to only 
precise instructions and not probability" (Office Action of March 2002, page 2, lines 12-14) is 
wrong. For example, most branch-prediction circuits store only indications of "likely" future 
program flow, not a "precise" prediction of future program flow. Analogously, in some 
implementations, the "likelihood of the existence" feature of the claims may provide an 
approximate early indication of future events or a likely approximation of a current state. In 
some implementations, that indication may be later confirmed or refuted with better information. 

Sixth, in the interview of June 26, 2002, the Examiner indicated that he would prefer that 
this language be replaced with language more idiomatic to the art of branch prediction. 
Applicant submits that this claim language is idiomatic to the art, exemplified by claims 1 and 6 
of U.S. Pat. No. 5,367,703, titled "Method and System for Enhanced Branch History Prediction 
Accuracy in a Superscalar Processor System," issued in 1994, assigned to I.B.M. These claims 
read as follows: 

1 . A method for enhanced branch history prediction accuracy in a 
superscalar processor system which is capable of fetching and dispatching up to N 
instructions simultaneously, said method comprising the steps of: 

establishing a branch history table containing multiple predictive fields, 
each of said multiple predictive fields containing data indicative of a likelihood 
that execution of a particular associated instruction will result in a branch within 
an executing set of instructions; 
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utilizing an associated one of said M predictive fields to determine a 
likelihood that execution of a corresponding instruction within said ordered 
sequence of M instructions will ... 

6. A system . . said system comprising: 

means for establishing a branch history table containing multiple 
predictive fields, each of said multiple predictive fields containing data indicative 
of a likelihood that execution of a particular associated instruction will result in a 
branch within an executing set of instructions; 

means for utilizing an associated one of said M predictive fields to 
determine a likelihood that execution of a corresponding instruction within said 
ordered sequence of M instructions will ... 

A number of other patents directed to branch prediction also use the word "likelihood" in a 

manner closely analogous to the usage in these claims. Note the variety of companies that have 

used this language - such broad use indicates that it is idiomatic to the art, and that this language 

is to "be read as [it] would be interpreted by those of ordinary skill in the art." MPEP § 21 1 1 .01. 
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Further understanding probe bits 624 may be gained by considering the alternative 
embodiments shown in Figs. 6a, 6b and 6c, discussed in section VI. A (pages 100-101), at page 
102, lines 6-19, and the first half of section VI.D at pages 106-108, and more generally 
throughout sections VLB, VI.C and VI.D (pages 101-111). Probeable events 610 and the probe 
mask 620 are also relevant, as discussed sections VLB, VI.C and VI.D (pages 101-111). The 
PFAT is shown in Figs, la, lb and Id, and discussed throughout the specification, particularly in 
section LA (pages 29-31), section I.C (pages 35-36), section VLB (pages 101-103), section VI.C 
(pages 103-106), section VI.D (pages 106-1 1 1), and section VI.G (pages 113-115). 

D. Issues nominally arising under § 112 ^ 1 

The Office Action raises a number of "rejections" nominally based on § 1 12 H 1 . 
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1. The Office Action is insufficient to raise any enablement rejection of 
any claim 

a. Almost all issues raised in the "enablement" section of the 
Office Action have been raised and previously resolved to the 

T ^Examiner's satisfaction — _ 

Almost all of the enablement issues of this Office Action, questioning only where certain 
teaching exists in the specification, have been raised earlier in prosecution, in the guise of § 112 
K 2 "rejections." These previously-raised issues were resolved to the Examiner's satisfaction, as 
noted by the absence of these issues from the Office Action of December 2001. It is believed 
that the discussion in Applicant's previous papers fairly meets nearly all of the issues raised in 
the § 1 12 H 1 portion of the March 2002 Office Action. 

For each such resolved issue, the discussion previously found acceptable by the Examiner 
is repeated below. 

b. The nominal § 112 1 "rejections" apply the wrong legal test 

Nearly all of the nominal §112^1 "rejections" raised in this Office Action give as the 
reason, "The specification fails to disclose." This is not a proper test under the "enablement" 
requirement of § 1 12 If 1 . The test for enablement is "undue experimentation." A claim 
limitation may well be "enabled" by the knowledge of one of ordinary skill, with no disclosure 
whatsoever in the specification. 

MPEP § 2164.04 reads as follows (bold in original, quotations and citations omitted, 
underline added): 

2164.04 Burden on the Examiner Under the Enablement Requirement 

Before any analysis of enablement, can occur, it is necessary for the 
examiner to construe the claims. For terms that are not well-known in the art, or 
for terms that could have more than one meaning, it is necessary that the examiner 
select the definition that he/she intends to use when examining the application, 
based on his/her understanding of what applicant intends it to mean, and explicitly 
set forth the meaning of the term and the scope of the claim when writing an 
Office action 

In order to make a rejection, the examiner has the initial burden to 
establish a reasonable basis to question the enablement provided for the claimed 
invention. . . . A specification disclosure which contains a teaching of the manner 
and process of making and using an invention in terms which correspond in scope 
to those used in describing and defining the subject matter sought to be patented 
must be taken as being in compliance with the enablement requirement of 35 
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U.S.C. 1 12, first paragraph, unless there is a reason to doubt the objective truth of 
the statements contained therein which must be relied on for enabling support. . . . 
As stated by the court, "it is incumbent upon the Patent Office, whenever a 
rejection on this basis is made, to explain why it doubts the truth or accuracy of 
any statement in a supporting disclosure and to back up assertions of its own with 

acceptable evidence or reasoning which is inconsistent with the contested' 

statement . Otherwise, there would be no need for the applicant to go to the trouble 
and expense of supporting his presumptively accurate disclosure." 

According to In re Bowen, 492 F.2d 859, 862-63, 1 8 1 USPQ 48, 5 1 
(CCPA 1974), the minimal requirement is for the examiner to give reason s for the 
uncertainty of the enablement. ... 

. . . For example, doubt may arise about enablement because information is 
missing about one or more essential parts or relationships between parts which 
one skilled in the art could not develop without undue experimentation. In such a 
case, the examiner should specifically identify what information is missing and 
why one skilled in the art could not supply "the informatibif without undue 
experimentation However, specific technical reasons are always required . 

In accordance with the principles of compact prosecution, if an 
enablement rejection is appropriate, the first Office action on the merits should 
present the best case with all the relevant reasons , issues, and evidence so that all 
such rejections can be withdrawn if applicant provides appropriate convincing 
arguments and/or evidence in rebuttal. . 

MPEP § 2164.04 states a number of requirements for an enablement rejection. In contrast, the 
Office Action merely accuses certain claim language, and meets none of the further analytical 
requirements set out in MPEP § 2164.04. Prosecution cannot advance when an Office Action 
fails to identify issues clearly, or state a rejection with sufficient detail to allow a focused 
response. An applicant should not be left to guess what the Examiner's unstated concerns might 
be. 

c. Statements in the specification must be accepted at face value 

. Finally, the MPEP requires the Examiner to "take an applicant's word" for the utility and 
enablement of the specification (emphasis added): 

2107.02 Procedural Considerations Related to Rejections for Lack of Utility 
An Asserted Utility Creates a Presumption of Utility 

As a matter of Patent Office practice, a specification which contains a 
disclosure of utility which corresponds in scope to the subject matter sought to be 
patented must be taken as sufficient to satisfy the utility requirement of § 1 01 for 
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the entire claimed subject matter unless there is a reason for one skilled in the art 
to question the objective truth of the statement of utility or its scope. 

See also the excerpts from § 2164.04 set out immediately above. 

Applicant's papers have provided extensive indications of the locations inihe , - 

specification that teach the how's and why's of the invention. Several more examples are 
presented throughout this paper. MPEP § 2164.05 warns that these statements in the 
specification must be presumed to be correct, and any enablement rejections must be supported 
by technical reasoning showing that the specification is incorrect: "The examiner should never 
make the [enablement] determination based on personal opinion" (bold and underline in 
original). Because there is no showing that any statement in the specification is incorrect or 
incredible, the Office Action is insufficientto raise any enablement rejection. 

Nonetheless, even though no rejections have been raised, in order to effectively advance 
prosecution, the Examiner's concerns will be answered as best understood. 

d. Rejection of language that does not appear in the claims 

The Office Action purports to reject a number of phrases under § 1 12 1 . Even before 
amendment, most of these phrases did not appear in any claim. To consider one example, no 
claim recites "the interrupt criteria being based at least in part on the likelihood ..." Rather, 
claim 1, even before amendment, recited "the interrupt criteria [are] based at least in part on [a] 
table entry ." Raising an interrupt based on a table entry would not require undue 
experimentation. 

Because these paragraphs of the Office Action do not relate to any claim in the 
application, no rejection is raised. 

2. "a table lookup circuitry having entries describing a likelihood of the 
existence of an alternate coding of instructions" 

The Office Action states "The specification fails to disclose a table lookup circuitry 

having entries describing a likelihood of the existence of an alternate coding of instructions." 

a. This "rejected" language does not appear in the claims; no 
"undue experimentation" has been shown 

The "rejected" language does not appear in any claim, and the Office Action does not 

indicate which claim is intended. See further discussion of this problem at section IX.D. 1 .d at 
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page 43, above. Nor does the Office Action describe any basis to believe that <c undue 
experimentation" would be required, or challenge the credibility of any statement in the 
specification. See further discussion of this problem at sections DC.D.l .b and IX.D.l .c at page 

41, above. For these.two reasons,-no rejection is raised., ^ , -/ 

The closest language that does appear in the claims is discussed in section IX.C.14 at 
page 37, above. 

b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. All of the elements of the phrase 
questioned by the Examiner have been discussed at some length during prior prosecution. See, 
for example, section IX.C.14 at page 37, above. 

c. The Office Action is inadequate to raise a rejection 

Because the location of supporting disclosure has been identified during prior 
prosecution, MPEP § 2164.04 requires that (bold and italic in original, underline added): 

2164.04 Burden on the Examiner Under the Enablement Requirement 

. . . A specification disclosure which contains a teaching of the manner and 
process of making and using an invention in terms which correspond in scope to 
those used in describing and defining the subject matter sought to be patented 
must be taken as being in compliance with the enablement requirement of 35 
U.S.C. 1 12, first paragraph, unless there is a reason to doubt the objective truth of 
the statements contained therein which must be relied on for enabling support. . . . 
As stated by the court, "it is incumbent upon the Patent Office, whenever a 
rejection on this basis is made, to explain why it doubts the truth or accuracy of 
any statement in a supporting disclosure and to back up assertions of its own with 
acceptable evidence or reasoning which is inconsistent with the contested 
statement . 

According to In re Bowen, 492 F.2d 859, 862-63, 181 USPQ 48, 51 
(CCPA 1974), the minimal requirement is for the examiner to give reason s for the 
uncertainty of the enablement. . . . 

. . . For example, doubt may arise about enablement because information is 
missing about one or more essential parts or relationships between parts which 
one skilled in the art could not develop without undue experimentation. In such a 
case, the examiner should specifically identify what information is missing and 
why one skilled in the art could not supply the information without undue 
experimentation However, specific technical reasons are always required . 
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A mere indication of claim language and a bald statement that "The specification does not 
disclose. . is inconsistent with earlier actions taken by the Examiner, and fails to meet the 
requirement for "specific technical reasons" for believing undue experimentation might be 

required.- — . ^ - ■ 

Because steps required to raise a rejection have not been performed, no rejection exists. . 

3. "interrupt circuitry which triggers an interrupt in accordance with 
interrupt criteria on execution of an instruction, wherein the 
architectural definition of the instruction does not call for an 
interrupt, the interrupt criteria being based at least in part on the 
likelihood of the existence of an alternate coding of instructions 
(probability)" 

The Office Action states "The specification fails to disclose an interrupt circuitry which 
triggers an interrupt in accordance with interrupt criteria on execution of an instruction, wherein 
the architectural definition of the instruction does not call for an interrupt, the interrupt criteria 
being based at least in part on the likelihood of the existence of an alternate coding of 
instructions (probability)." 

a. This "rejected" language does not appear in the claims; no 
"undue experimentation" has been shown 

The "rejected" language does not appear in any claim, and the Office Action does not 

indicate which claim is intended. See further discussion of this problem at section IX.D.l.d at 

page 43, above. Nor does the Office Action describe any basis to believe that "undue 

experimentation" would be required, or challenge the credibility of any statement in the 

specification. See further discussion of this problem at sections IX.D. 1 ,b and IX.D. 1 .c at page 

41 , above. For these two reasons, no rejection is raised. 

b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. Nearly identical issues have been raised 
and resolved to the Examiner's satisfaction earlier in prosecution. For example, the Response of 
August 10, 2001 reads as follows, at page 21 : 

C. "interrupt circuitry and the handler" 

In the Office Action, the Examiner requested information as follows: 
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Applicants are requested to identify the following components in the 
drawings and the description thereof in the specification: ... 

3. the interrupt circuitry and the handler, 

One particular example of the interrupt circuitry and handler are disclosed in Figs. 
6b and 6c, and discussed in section VLD (pages 106-1 1 1) of the specification. 8 ~: 
Other examples are discussed throughout section VI (pages 100-1 16). Other 
examples include the hardware and software that handle the ISA flag bit 180 
(section II, pages 44-46), and the CC flag bit 200 (section IV, pages 64-73). 

This explanation was deemed fully acceptable by the Examiner, as evidenced by the withdrawal 

of this concern in the Office Action of December 200 1 ; 9 

The next component of the phrase queried about in the current Office Action, 
"architectural definition of the instruction in an emulated architecture does not call for an 
interrupt," was similarly addressed in the same Response at page 14, as-discussed in section 
IX.C.6 at page 30 of this paper. That explanation was also deemed acceptable, because the 
question was not re-raised in the Action of December 2001. To applicant's knowledge, this 
element is not known in the art, but it is enabled and could be practiced without undue 
experimentation, as discussed in the portions of the specification indicated in section IX.C.6. 

The remainder of this phrase, referring to the "likelihood," is discussed in section 
IX.C.14 at page 37 of this paper. As noted there, this phrase also was fully explained to the 
Examiner's satisfaction in Applicant's Response of August 10, 2001. 

For reasons discussed in section IX.D.2 at page 44, no rejection exists. 

4. "a handler being responsive to the likelihood of the existence of an 
alternate coding of instructions to affect the instruction pipeline 
circuitry to effect control of an architecturally-visible data 
manipulation behavior or control transfer behavior of the 
instruction" 

The Office Action states, "The specification fails to disclose a handler being responsive 
to the likelihood of the existence of an alternate coding of instructions to affect the instruction 



To narrow this indication somewhat, a first embodiment of the interrupt circuitry is shown in 
Fig. 6b, the upper half of Fig. 6c, and the first half of section VI.D at pages 106-108. Section VI (pages 
101-111) is relevant in its entirety. 

9 Other examples of an "interrupt handler being responsive to a table entry" are shown in the 
lower half of Fig. 6c, and at section VI.D, especially pages 109-1 10. 
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pipeline circuitry to effect control of an architecturally- visible data manipulation behavior or 
control transfer behavior of the instruction as claimed." 

a. This "rejected" language does not appear in the claims; no 
- -undue experimentation" has been shown 

The "rejected" language does not appear in any claim, and the Office Action does not 

indicate which claim is intended. See further discussion of this problem at section IX.D.l.d at 

page 43, above. Nor does the Office Action describe any basis to believe that "undue 

experimentation" would be required, or challenge the credibility of any statement in the 

specification. See further discussion of this problem at sections IX.D.l.b and IX.D.l.c at page 

41 , above. For these two reasons, no rejection is raised. 

b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. Nearly identical issues have been raised 
and resolved to the Examiner's satisfaction earlier in prosecution. 

For example, the "handler" was discussed in the Response of August 10, 2001, and that 
explanation was accepted in the Action of December 2001, as explained in section IX.D.3 at 
page 45 of this paper. 

The "table whose entries indicate a likelihood of the existence of an alternate coding" 
was discussed in the Response of August 10, 2001, and that explanation was accepted in the 
Action of December 2001, as explained in section IX.C.14 at page 37 of this paper. 

"Effecting control of an architecturally-visible data manipulation behavior or control 
transfer behavior" was explained to the Examiner's satisfaction in the Response of August 10, 
2001 , as discussed in sections DC.C.4 and IX.C.5, at pages 28 and 29 of this paper. 

For reasons discussed in section IX.D.2 at page 44, no rejection exists. 

5. "an instruction pipeline circuitry being affected by the handler being 
responsive to the likelihood of the existence of an alternate coding of 
instructions to effect control of an architecturally-visible data 
manipulation behavior or control transfer behavior of the 
instruction" 

The Office Action states, "The specification fails to disclose an instruction pipeline 
circuitry being affected by the handler being responsive to the likelihood of the existence of an 



Response to Office Action of March 7, 2002 
9210792.3 



5231.16-4004C 09/429,094 



alternate coding of instructions to effect control of an architecturally-visible data manipulation 
behavior or control transfer behavior of the instruction as claimed." 



a. This "rejected" language does not appear in the claims; no 
--'undue experimentation" has been shown 

The "rejected" language does not appear in any claim, and the Office Action does not 

indicate which claim is intended. See further discussion of this problem at section IX.D.l .d at 

page 43, above. Nor does the Office Action describe any basis to believe that "undue 

experimentation" would be required, or challenge the credibility of any statement in the 

specification. See further discussion of this problem at sections IX.D. 1 .b and IX.D. 1 .c at page 

41, above. For these two reasons, no rejection is raised. 

b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. Nearly identical issues have been raised 
and resolved to the Examiner's satisfaction earlier in prosecution. For example, the Response of 
August 10, 2001 provides a description of "instruction pipeline circuitry being affected by the 
handler," at page 19: 

O. "Function of the interrupt handler" 

The Examiner requested information as to the meaning of the "Function of 
the interrupt handler" as recited in claim 21. Claims 20 and 21 recite, in pertinent 
part: 

20. The microprocessor chip of claim 19, further comprising: 

interrupt handler software designed to service the interrupt and to 
return control to an instruction flow of the process other than the 
instruction flow triggering the interrupt, the returned-to instruction flow 
for carrying on non-error handling normal processing of the process. 

2 1 . The microprocessor chip of claim 20, wherein the interrupt 
handler software is programmed to change an instruction set architecture 
under which instructions are interpreted by the computer. 

As the Examiner observes, the interrupt handler of claim 21 does some things that 
are not "common." Claim 20 recites that the handler is "designed to service the 
interrupt and to return control to an instruction flow of the process other than the 
instruction flow triggering the interrupt." Claim 21 recites that "the interrupt 
handler software is programmed to change an instruction set architecture." 

This explanation was regarded as sufficient, because the issue was dropped from the Action of 

December 2001. 
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Further examples of "instruction pipeline circuitry being affected by the handler" are 
shown in lower half of Fig. 6c; section VI.D, especially pages 109-1 10; see also section VI 
(pages 100-1 16), section II (pages 44-46), and section IV (pages 64-73). 

Any issue relatm£_tojh^^ to the likelihood of the existence, of 

an alternate coding of instructions" has been resolved earlier in prosecution, and re-raising a 
rejection now is inappropriate, as discussed in section DC.C.14 at page 37 above. 

"Effecting control of an architecturally-visible data manipulation behavior or control 
transfer behavior" was explained to the Examiner's satisfaction in the Response of August 10, 
2001, as discussed in sections EX.C.4 and IX.C.5, at pages 28 and 29 of this paper. 

For reasons discussed in section IX.D.2 at page 44, no rejection exists. 

6. "an instruction pipeline circuitry being responsive to the likelihood of 
the existence of an alternate coding of instructions to alter a 
manipulation behavior or control transfer behavior of the instruction 
in a manner incompatible with the architectural definition of the 
instruction 9 ' 

The Office Action states "The specification fails to disclose an instruction pipeline 
circuitry being responsive to the likelihood of the existence of an alternate coding of instructions 
to alter a manipulation behavior or control transfer behavior of the instruction in a manner 
incompatible with the architectural definition of the instruction as claimed in claims 10 and 39 
for example." 

a. This "rejected" language does not appear in the claims; no 
"undue experimentation" has been shown 

The "rejected" language does not appear in either claim 10 or 39. See further discussion 

of this problem at section IX.D.l.d at page 43, above. Nor does the Office Action describe any 

basis to believe that "undue experimentation" would be required, or challenge the credibility of 

any statement in the specification. See further discussion of this problem at sections EX.D.l.b 

and IX.D. 1 .c at page 41 , above. For these two reasons, no rejection is raised. 

b. This issue has been raised and resolved earlier in prosecution 

This issue appears to have been raised in error. Any issue relating to the language "being 
responsive to the likelihood of the existence of an alternate coding of instructions" has been 
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resolved earlier in prosecution, as discussed in section EX.C.14 at page 37 above. Re-raising a 
rejection now is inappropriate. 

"Altering a data manipulation behavior or control transfer behavior of the instruction" 
was explained tothe Exanuner's .satisfaction in the Response, of August 10, 2001, as.discussed in^i 
sections IX.C.4 and EX.C.5, at pages 28 and 29 of this paper. 

"Altering an [instruction's behavior] in a manner incompatible with the architectural 
definition of the instruction'* is a resolved issue, as explained in section EX.C.7 at page 31. 

For reasons discussed in section IX.D.2 at page 44, no rejection exists. 

7. "control of architecturally-visible data manipulation behavior 
includes changing an instruction set architecture under which 
instructions are interpreted" 

The Office Action states "The specification fails to disclose the control of architecturally- 
visible data manipulation behavior includes changing an instruction set architecture under which 
instructions are interpreted by the computer of claim 5." 

The Office Action does not describe any basis to believe that "undue experimentation" 
would be required, nor challenge the credibility of any statement in the specification. 
Accordingly, no rejection exists. 

Examples are found throughout the specification. Attention is drawn specifically to 
section II (pages 44-46), page 110, lines 6-8; see generally section VI (pages 100-1 16). 

8. "an interrupt circuitry to trigger an interrupt in accordance with 
synchronous interrupt criteria being based on a memory state and 
wherein the architectural definition of the instruction in an emulated 
architecture does not call for an interrupt" 

The Office Action states "The specification fails to disclose an interrupt circuitry to 
trigger an interrupt in accordance with synchronous interrupt criteria being based on a memory 
state and wherein the architectural definition of the instruction in an emulated architecture 
does not call for an interrupt" (emphasis in original). 

"Synchronous interrupt" is an established term of art, as discussed above in section 
IX.D.9 at page 51 . No undue experimentation is required. 
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"Memory state" is an established term of art, as discussed above in section IX.D.10 at 
page 52. "Synchronous interrupts" based on "a memory state" do not require undue 
experimentation - for example, TLB fills and page faults are known. 

„ "Triggering an interrupt [for an instruction whose] architectural-definition ... does not _ 

call for an interrupt" is discussed in section IX.C.6 at 30. 

"Architectural definition of an instruction" is discussed in section IX.C.3.b at page 27 of 
this paper. 

One embodiment is described particularly in Fig. 6b, and the upper half of Fig. 6c. Also 
relevant are col. 610 of Fig. 4b 5 page 106, lines 9-20, see also section VI. A (page 100-101), 
section VLB (pages 101-103), and section VI.D (pages 106-1 1 1). 

9. "Synchronous interrupt criteria" 

The Office Action states, "The specification . . . fails to explain what synchronous 
interrupt criteria .. are." 

The Examiner may recall that claim 1, as originally filed, recited 

interrupt circuitry cooperatively designed with the instruction pipeline 
circuitry to trigger an interrupt on execution of an instruction of a process, 
synchronously based at least in part on a memory state of the computer and the 
address of the instruction. . . 

The Office Action of April 2001 rejected claim 1 because "Claim 1 fails to recite what event 
causes the interrupt." Even though the rejection was factually and legally groundless 10 , the claim 
was amended to explicitly recite "criteria" that cause the interrupt. In this Response, the claim is 
amended largely to its original form, to eliminate the word "criteria" and return to the original 
and unarguably well-established term, "synchronous interrupt." 

Applicant does not concede that any enablement rejection was properly raised - MPEP 
§ 2164.01 instructs that "A patent need not teach, and preferably omits, what is well known in 
the art." "Synchronous interrupts" are known in the art, and no "undue experimentation" would 
be required to design a computer that raised interrupts based on desired synchronous interrupt 



The claim recited that the interrupt was triggered "based at least in part on a memory state of 
the computer and the address of the instruction." The statement that "fails to recite what event causes the 
interrupt" was simply incorrect. Further, there is no requirement under § 1 12 TJ 2 that a prior cause be 
recited for every step in a claim. 
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criteria. A quick search of the U.S. patent database on Westlaw reveals that seventy patents 
issued since 1970 use the phrase "synchronous interrupt." See also Tanenbaum, Exhibit 3, pages 
263-269; Hennessey & Patterson, Exhibit 4, pages 214-220; Intel manual, vol. 1, Exhibit 5, 
pages 4-10 to 4-11. No enablement rejection -is _properly;raised. - ^ _ - - *-,. 

Synchronous interrupts arise out of the execution of instructions. Examples include 
TRAP or SVC exceptions, divide-by-zero exceptions, page faults, and TLB misses. In contrast, 
"asynchronous interrupts" are not directly connected to an instruction. Examples of 
asynchronous interrupts include most interrupts from peripheral devices, timers, or power-fail. 

The claim language "synchronous interrupt," or analogous concepts, are supported at 
page 18, line 29; page 19, line 16; page 25, line 29; claim 1 (as originally filed), line 12; Fig. 6b, 
the upper half of Fig. 6c; by section VI of the specification (pages 101-117), particularly the 
discussion of the probe interrupt discussed in and the first half of section VI.D at pages 106-108; 
by the ISA-change exception described in sections II (pages 44-46), III.C-III.H (pages 51-64); or 
by the calling-convention change exception described in section IV (pages 64-73); see also Figs. 
3a-3o and section II (pages 44-46); Figs. 2a-2c and section IV (pages 64-73). 

10. "Memory state" 

"Memory state" is an established term of art. For example, a Westlaw search of the 
patent database reveals that over 2,400 patents have used the term "memory state" or "state of 
memory" since 1970. For example, the second sentence of U.S. Pat. No. 6,035,376 reads: "The 
present invention relates to a system and method for maintaining cache coherence that is event 
driven and changes the state of the caches and memories based on the current memory state and 
a head of a list of corresponding cache entries." U.S. Pat. No. 5,829,032 uses the sentence "A 
memory tag state controller 126 generates a state signal for cache coherency based on the 
memory tag state." One of ordinary skill in the art would not have to resort to undue 
experimentation to achieve a desired memory state, or to have hardware respond as desired to a 
memory state. 

Various recitations of "memory state" are supported by one or more of the following: 
Fig. la; Figs. 2a-2c; page 18, line 19; page 19, line 17; claim 1 (as originally filed), line 13; 
section II (pages 44-46), particularly the discussion of ISA bits 180, 182, 194; section IV (pages 
64-73), particularly the discussion of the "calling convention" bit 196, 200; Figs. 6a-6c, and 
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section VI of the specification (pages 110-116), particularly the discussion of probe bits 624 and 
the PIPM table 602 (Physical IP Map, described in most detail in section VI.A and VI.D, pages 
110-1 12 and 108-111), and page 102, lines 6-19. 

11. what the wherein clause ["wherein the architectural definition of the 
instruction in an emulated architecture does not call for an 
interrupt"] means 

The Office Action states "The Examiner is unable to find the explanation of the wherein 
clause in the specification." 

Earlier in prosecution, the Examiner requested the same information^ and it was provided 
to the Examiner's satisfaction. See section IX.C.6 at page 30. 

If there is to bjejan enablement rejection, the burden is on-the Examiner to show a 
reasonable basis to believe that some statement is unlikely to be true. See MPEP §§ 2164.04 and 
2164.05, and sections IX.D.l.b and IX.D.l.c at page 41. Because the Office Action attempts no 
such showing, no rejection exists. 

In view of the amendments and remarks, Applicant respectfully submits that no claim is 
rejected by the Office Action of March 2002. Further, the claims are in condition for allowance. 
Applicant requests that the application be passed to issue in due course. The Examiner is urged 
to telephone Applicant's undersigned counsel at the number noted below if it will advance the 
prosecution of this application, or with any suggestion to resolve any condition that would 
impede allowance. In the event that any further extension of time is required, Applicant petitions 
for that extension of time required to make this response timely. Kindly charge any additional 
fee, or credit any surplus, to Deposit Account 50-0675, Order No. 5231.16-4004C. 



Respectfully submitted, 
SCHULTE ROTH & ZABEL 



Dated: July 8. 2002 




Mailing Address: 
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New York, New York 10022 

(212) 756-2000 

(212) 593-5955 Telecopier 
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EXHIBIT 2 



1 
2 
3 
4 



VERSION OF REWRITTEN CLAIMS MARKED UP TO SHOW CHANGES 

1. (three times amended, last amended 7/8/2002) A microprocessor chip, 
comprising: " 

instruction pipeline circuitry; and 

table lookup circuitry designed to retrieve an entry from a table, each entry of the 
table being associated with a corresponding address range of an address space translated by 
address translation circuitry of the microprocessor chip, each entry describing a likelihood of 
the existence of an alternate coding of instructions located in the respective corresponding 
address range, the table lookup circuitry operable as part of the basic instruction cycle of 
executing an instruction of a non-supervisor mode program for execution on the 
microprocessor chip [executing on a computer] , the table being stored in storage that is 
architecturally invisible to programs in the native architecture of at least some instructions 
executed by the microprocessor chip ; 

interrupt circuitry cooperatively designed with the instruction pipeline circuitry to 
[synchronously] trigger a synchronous [an] interrupt [in accordance with interrupt criteria] on 
execution of an instruction of a process, wherein the architectural definition of the instruction 
of the process does not call for an interrupt, a trigger for the interrupt [criteria] being 
synchronously based at least in part on the table entry corresponding to [associated with] the 
address of the instruction of the process, the interrupt circuitry being designed to invoke a 
handler for the interrupt, the handler being responsive to a content of the table entry to affect 
the instruction pipeline circuitry to effect control of an architecturally- visible data 
manipulation behavior or control transfer behavior of the instruction of the process, based at 
least in part on the contents of a table entry corresponding to [associated with] the address 
range in which the instruction of the process lies. 

2. (three times amended, last amended 7/8/2002) A method, comprising the steps of: 
as part of the basic instruction cycle of executing an instruction of a non-supervisor 

mode program executing on a computer, consulting a table, the table having entries that are 
[being] indexed by the address within an address space of instructions executed, entries of 
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5 the table containing attributes of instructions whose addresses index to the respective table 

6 entries; and 

7 controlling an architecturally-visible data manipulation behavior or control transfer 

8 behavior of the instruction based at least in part on a content of a table entry indexed by 

9 [associated with] the address of the instruction. 

3. (twice amended, last amended 7/8/2002) The method of claim 2, wherein the 
control of control transfer behavior includes transfer of execution control to a second 
instruction for execution , the second instruction being an instruction other than an instruction 
architecturally-defined to be the successor instruction of the instruction. 



9. (twice amended, last amended 7/8/2002) The method of claim 2, further 
comprising the steps of: 

[synchronously] triggering a synchronous [an] interrupt on execution of the [an] 
instruction [of a process in accordance with interrupt criteria, the interrupt criteria being] 
based at least in part on a memory state of the computer and the address of the instruction, 
wherein the architectural definition of the instruction in the instruction's native [an emulated] 
architecture does not call for an interrupt. 

10. (twice amended, last amended 7/8/2002) A microprocessor chip, comprising: 
instruction pipeline circuitry; 

table lookup circuitry designed to index into a table by a memory address of a 
memory reference arising during execution of an architecturally-defined instruction, and to 
retrieve a table entry corresponding to the address, the table entry being distinct from the 
memory referenced by the memory reference; 

the instruction pipeline circuitry being responsive to the contents of the table entry to 
alter a manipulation of data or control transfer [of control] behavior of the instruction in a 
manner incompatible with the architectural definition of the instruction in the instruction's 
native architecture. 
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11. (three times amended, last amended 7/8/2001) The microprocessor chip of claim 
1 0, further comprising: 

a binary translator programmed to translate at least a selected portion of a computer 
program from a first bin ary representation to a second binary representation; and - - 

wherein the pipeline control circuitry is further designed to initiate a determination of 
whether to transfer control from an execution of the architecturally-defined instruction, the 
architecturally-defined instruction being an instruction of the first binary representation of 
the program, to the second binary representation, and effective to initiate the determination 
with neither a query nor a control transfer [of control] to the second binary representation 
being coded into the first binary representation. 

12. (twice amended, last amended 7/8/2002) The microprocessor chip of claim 10, 
further comprising: 

interrupt circuitry cooperatively designed with the instruction pipeline circuitry to 
trigger a synchronous [an] interrupt on execution of the architecturally-defined instruction [in 
accordance with interrupt criteria, the interrupt criteria being] based at least in part on the 
contents of the table entry [a memory state of the computer and the address of the 
instruction], wherein the architectural definition of the architecturally-defined instruction in 
the instruction's native [an emulated] architecture does not call for an interrupt. 

13. (twice amended, last amended 7/8/2001) The microprocessor chip of claim 12, 
further comprising: 

interrupt handler software designed to service the interrupt and to effect the altering 
of a control transfer [of control] behavior, the altering being in the form of returning control 
from the handler to an instruction flow [of the process] other than the instruction flow 
triggering the interrupt, the returned-to instruction flow for carrying on non-error handling 
normal processing logically equivalent to the architecturally-defined processing of 
instructions following the altered instruction [of the process] . 
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2 
3 
4 
5 
6 
7 
8 
9 



17. (amended, last amended 7/8/2002) The microprocessor chip of claim 14, 
wherein: 

the instruction pipeline circuitry is responsive to the contents of the lookup structure 

entry to affect a n architecturally-visible manipulation of data or control itransfer [of control] - 

behavior of [defined for] the instruction. 

18. (twice amended, last amended 7/8/2002) The microprocessor chip of claim 14, 
further comprising: 

interrupt circuitry cooperatively designed with the instruction pipeline circuitry to 
trigger a synchronous [an] interrupt on execution of an instruction of a process [, 
synchronously based on interrupt criteria, the interrupt criteria] based at least in part on a 
lookup structure entry associated with [a memory state of the computer and] the address of 
the instruction, wherein the architectural definition of the instruction in the instruction's 
native [an emulated] architecture does not call for an interrupt. 

19. (four times amended, last amended 7/8/2002) A microprocessor chip, 
comprising: 

instruction pipeline circuitry; and 

interrupt circuitry cooperatively designed with the instruction pipeline circuitry to 
trigger a synchronous [an] interrupt on execution of an instruction of a process [in 
accordance with synchronous interrupt criteria, the interrupt criteria being] based at least in 
part on a memory state of the computer and the address of the instruction, wherein the 
architectural definition of the instruction in the instruction's native [an emulated] architecture 
does not call for an interrupt. 

23. (amended, last amended 7/8/2002) The microprocessor chip of claim 19, further 
comprising: 

table lookup circuitry designed to index into a table by a memory address within an 
address space of a memory reference arising during execution of the [an] instruction, and to 
retrieve a table entry corresponding to the memory-reference address: 
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the instruction pipeline circuitry being responsive to the contents of the table entry to 
affect an architecturally- visible manipulation of data or control transfer [of control] behavior 
of [defined for] the instruction. 



26. (twice amended, last amended 7/8/2002) The method of claim 24, further 
comprising the step of: 

altering a behavior of the instruction in a manner incompatible with the architectural 
definition [in an emulated architecture] of the instruction in the instruction's native 
architecture , based at least in part on a content of the lookup structure entry corresponding to 
the address range containing the instruction. 

27. (twice amended, last amended 7/8/2001) The method of claim 24, 
wherein each lookup structure entry corresponds to a page managed by a virtual 

memory manager, and wherein circuitry for locating a lookup structure [an] entry [of the 
table] is integrated with virtual memory address translation circuitry of the computer. 

28. (twice amended, last amended 7/8/2002) The method of claim 24, further 
comprising the step of: 

based at least in part on a content of the lookup structure entry, transferring control to 
the alternative coding, the alternative coding being an instruction flow of the process other 
than the instruction flow triggering the consulting, the transferred [returned]-to instruction 
flow being programmed to carry on non-error handling normal processing of the process. 

29. (twice amended, last amended 7/8/2002) The method of claim 24, further 
comprising the step of: 

based at least in part on a content of the lookup structure entry, [synchronously] 
triggering a synchronous [an] interrupt [in accordance with interrupt criteria, the interrupt 
criteria being based at least in part on a memory state of the computer and the address of the 
instruction], wherein the architectural definition of the instruction in the instruction's native 
[an emulated] architecture [of the instruction] does not call for an interrupt. 
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30. (twice amended, last amended 7/8/2001) A method, comprising the steps of: 
on execution of an instruction of a process in a computer, [synchronously] triggering 
a synchronous [an] interrupt based at least in part on a memory state of the computer and the 
address of the instruction, whereinAe architectural definition of the instruction in the 
instruction's native architecture does not call for an interrupt. 

32. (amended 7/8/2002) The method of claim 31, further comprising the step of: 
changing an instruction set architecture under which instructions are interpreted by 
the computer in [the] handler software for the interrupt. 

34. (twice amended, last amended 7/8/2002) The method [microprocessor chip] of 
claim 30, further comprising: 

[table lookup circuitry designed to] indexing into a table by a memory address within 
an address space of a memory reference arising during execution of an instruction, and to 
retrieve a table entry corresponding to the address; 

responding [the instruction pipeline circuitry being responsive] to a content of the 
table entry to affect an architecturally- visible manipulation of data or control transfer [of 
control] behavior of [defined for] the instruction. 

35. (added 1/9/2001) The method of claim 2, wherein the table entry indexed by the 
address of the instruction is associated with a range of instruction addresses. 

Kindly cancel claims 36, 37 and 38 without prejudice or disclaimer. 

1 39. (amended, last amended 7/8/2002) A method, comprising the steps of: 

2 as part of the basic instruction cycle of executing an architecturally-defined 

3 instruction of a non-supervisor mode program executing on a computer, retrieving an entry 

4 from a table, the [entry of the] table entry being indexed by the address of a memory 

5 reference arising during execution of the architecturally-defined instruction, the table entry 

6 being distinct from the memory referenced by the memory reference; 
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7 based at least in part on a content of the table entry, altering a manipulation of data or 

8 control transfer [of control] behavior of the architecturally-defined instruction in a manner 

9 incompatible with the arc hitectural definition [in an emulated arc hitecture] of the 

10 architecturallv-defined instruction in the architecturally-defined instruction's native 

11 architecture . 



41 . (amended, last amended 7/8/2002) The method of claim 39, wherein the [entry 
of the] table entry is indexed by the address within an address space of instructions fetched 
for execution [executed]. 

42. (amended, last amended 7/8/2002) The method of claim 39, wherein: 
entries of the table correspond to respective address ranges, and the table entries 

describe a likelihood of the existence of an alternate coding of instructions located in the 
respective corresponding address ranges. 

43. (amended, last amended 7/8/2002) The method of claim 39: 

wherein the architectural definition of the architecturallv-defined instruction in the 
architecturally-defined instruction's native [emulated] architecture does not call for an 
interrupt; 

and further comprising the step of [synchronously] triggering a synchronous [an] 
interrupt based at least in part on a memory state of the computer and the address of the 
architecturallv-defined instruction. 

44. (amended, last amended 7/8/2002) The method of claim 39, wherein the control 
of control transfer [of control] behavior includes transfer of execution control to a second 
instruction for execution. 

45. (amended, last amended 7/8/2002) The method of claim 44, wherein the second 
instruction is coded in an instruction set architecture (ISA) different than the ISA of the 
architecturallv-defined [executed] instruction. 
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Kindly cancel claim 49. 
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50. ^^ende^Jast amended 7/8/2002) An apparatus, comprising: - ;_r ^ 
instruction pipeline circuitry; and 

table lookup circuitry designed to retrieve a table entry from a table whose entries are 
indexed by an address within an address space of an instruction fetched for execution by the 
instruction pipeline circuitry ; 

the instruction pipeline circuitry being responsive to a content of the table entry to 
control an architecturally- visible data manipulation behavior or control transfer behavior of 
the fetched instruction based at least in part on a content of the table entry indexed by 
[associated with] the address of the instruction. 

51. (amended, last amended 7/8/2002) The apparatus [method] of claim 50, wherein 
the control of control transfer behavior includes transfer of execution control to a second 
instruction for execution, the second instruction being coded in an instruction set architecture 
(ISA) different than the ISA of the executed instruction. 

52. (amended, last amended 7/8/2002) The apparatus [method] of claim 50, 
wherein entries of the table correspond to pages managed by a virtual memory 

manager, circuitry for locating a table entry being integrated with virtual memory address 
translation circuitry of the computer. 

Kindly add the following new claim 53: 

53. (new, added 7/8/2002) The apparatus of claim 50, wherein: 

the table entry is stored in storage architecturally invisible to programs executing in 
the instruction's native architecture. 
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Kindly amend claims 54-58 as follows: 



54. (amended, last amended 7/8/2002) The apparatus [method] of claim 50, further 
.comprising [the steps of]: , _ „ \ . . _ . „ 

circuitry designed to trigger a synchronous [triggering an] interrupt on execution of 
an instruction of a process, [synchronously] based at least in part on a memory state of the 
computer and the address of the instruction, wherein [when] the architectural definition of the 
instruction in the instruction's native [an emulated] architecture does not call for an interrupt. 

55. (amended, last amended 7/8/2002) The apparatus [method] of claim 50, wherein: 
the table entries are indexed by a virtualaddress of the instruction. 

56. (amended, last amended 7/8/2002) The apparatus [method] of claim 50, wherein: 
the table entries are indexed by a physical address of the instruction. 

58. (amended 7/8/2002) The microprocessor chip of claim 19, wherein a trigger 
for the interrupt is [criteria are] further based on a [the] value of the instruction. 

Kindly add the following new claims. 

59. (new 7/8/2002) The microprocessor chip of claim 19, wherein: 

the memory state on which triggering the interrupt is based includes an entry of a 
table indexed by the address of instructions fetched for execution, entries of the table 
containing attribute indicia of instructions whose addresses index to the respective entries, 
the table entries being architecturally invisible to an architecture for execution in the 
instruction pipeline circuitry. 

60. (new 7/8/2002) The method of claim 24, wherein: 

the lookup structure entries are architecturally invisible to programs executing in an 
instruction set architecture executed by the instruction pipeline circuitry. 
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61. (new 7/8/2002) A method, comprising the steps of: 

as part of the basic instruction cycle of executing an instruction of a non-supervisor 
mode program fetched for execution on ^computer, consulting a table,, entries of the table 
being indexed by addresses of instructions fetched, entries of the table containing attribute 
indicia of instructions whose addresses index to the respective entries; and 

controlling an architecturally- visible data manipulation behavior or control transfer 
behavior of the fetched instruction based at least in part on a content of a table entry indexed 
by the address of the fetched instruction, the table entries being architecturally-invisible in 
the fetched instruction's native architecture. 

62. (new 7/8/2002) The method of claim 61, wherein the control of control transfer 
behavior includes transfer of execution control to a second instruction for execution, the 
second instruction being an instruction other than an instruction architecturally-defined to be 
the successor instruction of the fetched instruction. 

63. (new 7/8/2002) The method of claim 62, wherein the second instruction is coded 
in an instruction set architecture (ISA) different than the ISA of the fetched instruction. 

64. (new 7/8/2002) The method of claim 61, wherein the control of architecturally- 
visible data manipulation behavior includes changing an instruction set architecture under 
which instructions are interpreted by the computer. 

65. (new 7/8/2002) The method of claim 61, wherein the behavior control includes 
selecting between two different instruction set architectures, and the computer includes 
instruction pipeline circuitry designed to effect interpretation of computer instructions under 
the two instruction set architectures alternately. 
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66. (new 7/8/2002) The method of claim 61, wherein: 

the attribute indicia describe a likelihood of the existence of an alternate coding of 
instructions located in respective address ranges corresponding to the table entries. 



1 
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67. (new 7/8/2002) The method of claim 61: 

wherein the architectural definition of the fetched instruction in the fetched 
instruction's native architecture does not call for an interrupt; 

and further comprising the step of triggering a synchronous interrupt based at least in 
part on a memory state of the computer and the address of the fetched instruction. 

68. (new 7/8/2002) The method of claim 61, further comprising the step of: 
altering a behavior of the fetched instruction in a manner incompatible with the 

architectural definition of the fetched instruction in the fetched instruction's native 
architecture, based at least in part on a content of the table entry indexed by the address of 
the fetched instruction. 

69. (new 7/8/2002) The method of claim 61, wherein: 

entries of the table correspond to pages managed by a virtual memory manager, and 
wherein circuitry for locating an entry of the table is integrated with virtual memory address 
translation circuitry of the computer. 

70. (new 7/8/2002) An apparatus, comprising: 
instruction pipeline circuitry; and 

table lookup circuitry designed to retrieve a table entry from a table whose entries are 
indexed by an address of an instruction fetched for execution, the table being stored in 
storage that is architecturally invisible to programs in the fetched instruction's native 
architecture; 

the instruction pipeline circuitry being responsive to a content of the table entry to 
control an architecturally- visible data manipulation behavior or control transfer behavior of 
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9 the fetched instruction based at least in part on a content of the table entry associated with the 
1 0 address of the fetched instruction. 

71 . (new 7/8/2002) The apparatus of claim 70, wherein the control of control 
transfer behavior includes transfer of execution control to a second instruction for execution, 
the second instruction being coded in an instruction set architecture (ISA) different than the 
ISA of the fetched instruction. 

72. (new 7/8/2002) The apparatus of claim 70, 

wherein entries of the table correspond to pages managed by a virtual memory 
manager, the table lookup circuitry being integrated with virtual memory address translation 
circuitry. 

73. (new 7/8/2002) The apparatus of claim 70, wherein: 

entries of the table describe a likelihood of the existence of an alternate coding of 
instructions located in address ranges corresponding to respective table entries. 

74. (new 7/8/2002) The apparatus of claim 70, wherein: 

the control of the fetched instruction's behavior includes altering a manipulation of 
data or control transfer behavior of the fetched instruction in a manner incompatible with the 
architectural definition of the fetched instruction in the fetched instruction's native 
architecture. 

75. (new 7/8/2002) The apparatus of claim 70, further comprising the steps of: 
triggering a synchronous interrupt on fetch or execution of the fetched instruction, 

based at least in part on a memory state of the computer and the address of the fetched 
instruction, wherein the architectural definition of the fetched instruction in the fetched 
instruction's native architecture does not call for an interrupt. 
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76. (new 7/8/2002) The apparatus of claim 70, wherein: 

the table entries are indexed by virtual addresses of instructions. 



77. (new 7/8/2002) The apparatus of claim 70, wherein: 

the table entries are indexed by physical addresses of instructions: ~ 

78. (new 7/8/2002) The method of claim 9, wherein a portion of the memory state 
relevant to the interrupt trigger includes a content of the table entry indexed by the address of 
the instruction, wherein the architectural definition of the instruction does not call for an 
interrupt. 

79. (new 7/8/2002) The method of claim 2; whWein: 

the controlling of instruction behavior includes altering a manipulation of data or 
control transfer [of control] behavior of the instruction in a manner incompatible with the 
architectural definition of the instruction in the instruction's native architecture. 

80. (new 7/8/2002) The method of claim 2, wherein: 

the step of consulting the table as part of executing an instruction is performed as part 
of fetching the instruction. 
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THE CONVENTIONAL MACHINE LEVEL 



This chapter introduces the conventional machine level (level 2) and discusses 
many aspects of its architecture. Historically, level 2 was developed before any of the 
other levels, and it is still widely (and incorrectly) regarded as "the" machine 
language. This situation has come about because on many machines the micropro- 
gram is in a read-only memory, which means that users (as opposed to the machine's 
manufacturer) cannot write programs for level 1 . Furthermore, even on machines that 
are user microprogrammable, the enormous complexity of the level 1 architecture is 
enough to scare off all but the most stouthearted programmers. In addition, because 
no machines have protection hardware at level 1, it is not possible to allow one person 
to debug new microprograms while anyone else is using the machine. This charac- 
teristic further inhibits user microprogramming. 



5.1. EXAMPLES OF THE CONVENTIONAL MACHINE LEVEL 

Rather than attempt to define rigorously what the conventional machine level is 
(which is probably impossible anyway), we will introduce this level by means of four 
examples. The next four sections are devoted to examining the conventional machine 
level of four families of well-known, commercially available computers: the IBM 370, 
the DEC PDP-11, the Motorola MC68000, and the Zilog Z80. The purpose of choos- 
ing four existing computers to study is to show how the ideas discussed here can be 
applied to the "real world." These machines will be compared and contrasted in many 
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ways and they will continue to serve as running examples in succeeding chapters, as 
they have in past ones. 

You should not draw the conclusion that the remainder of the book is about pro- 
gramming the 370, PDP-11, 68000 or Z80. These machines will be used to illustrate 
the idea of designing a computer as a series of levels. Various features of their 
respective organizations will be examined and some information about programming 
them will be introduced where necessary. Early in the chapter, the complete instruc- 
tion sets for all four machines will be presented in tables. You are not expected to 
fully understand them initially, although going through each list and making educated 
guesses about what the instructions probably do is certainly instructive. Many of the 
instructions will be discussed in more detail later in the chapter. 

Nevertheless, you should keep in mind the central, unifying idea that computers 
can be designed in a structured way. The technique of building a computer as a series 
of levels is a powerful structuring technique. An understanding of some of the details 
and idiosyncrasies of the four machines is necessary to understand the various levels 
but try to relate the details to the overall structure and do not wallow in them. 

5.1.1. IBM System/370 

In 1964 IBM introduced the System/360, a family of computers with identical 
level 2 architectures and instruction sets but spanning a wide range of performance 
and price. The idea behind the 360 was to allow customers to buy whichever model 
was appropriate at the time of purchase and later be able to upgrade to a larger model 
as the work increased, without having to rewrite any programs. In the early 1970s, 
IBM brought out various models of the System/370 series as successors to the 360 
series using more modem technology. The level 2 architecture of the 370 series is a 
minor extension of the 360 series. In subsequent years IBM marketed the 43xx series 
(4331, 4341, etc.), 30xx series (3031, 3032, 3033, etc.) and other computer whose 
level 2 architectures are practically identical to that of the 370. For the sake of sim- 
plicity, we have chosen to refer to these machines collectively as the 370, but you 
should be aware that minor architectural differences exist among the various series, 
and even between models of one series. Thus at level 2, all the machines are nearly 
identical but at level 0 and level 1 they are all completely different. 

An IBM 370 consists of one or more CPUs, one or more I/O processors, a main 
memory, and various I/O devices, as shown in Fig. 5-1. Three kinds of I/O proces- 
sors exist, called multiplexer channels, block multiplexer channels, and selector 
channels, the first type being used with low-speed I/O devices, such as card readers, 
printers, and card punches, and the other two being used with high-speed I/O devices, 
such as disks, drums, and tapes. All the processors in the computer have access to 
main memory for reading and writing. The channels have a few internal registers but 
no main memory of their own. 

The smallest addressable unit in the main memory is the byte, consisting of 8 
bits. Each byte has a unique address, numbered 0, 1, 2, 3, 4, n - 1, where n is 
the number of bytes of memory, up to a maximum of n = 2 24 . Two consecutive 
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Fig. 5-1. Organization of an IBM 370 computer with one CPU and two I/O 
processors (channels). 

bytes form a half word, 4 consecutive bytes form a word, and 8 consecutive bytes 
form a double word. Words are more important than half words or double words, so 
the 370 is bften regarded as hiaving a 32-bit word. The address of a hif word, word, 
of double word is the address of its lbwest-iiumbered b>te, which, on some 'blder 
models, must be ah lntegr^ iiiiht^le ^ Figyre 5-2 ; illustrates 

the -addressing s^cture of memory. Bach byte is part pif a^half ;word, a \vord, 

and a double word. For example; byte 7 -is. &e half 
word ar6,^£'^ 6:, ;Ttie CPU has ii^tructioris for 

fetching bytes ; half words, words, and double words. //; , , 

: ;vjThe 370 has: a' special format for storing packed decimal numbers. . Four bits are 
needed to represent a di^t in the range 0 to 9, so two decimal digits can be packed 
into one 8-bit byte. In this fommt, a byte may contain any number from 0 to 99. If 
used to store numbers in pure binary form, a. byte can hold any number between 0 and 
255, and JZ bits^are sufficiehtHo hold r ^1 the * numbers from , 0 to ,99. ^ The packed 
decimal jformat does not ; ; : ; v 

- Packed decimal -numbers do have^ certain advantages oyer; binary numbers, |ipw- 
ever. Datai input to the computer: or .output from the computer are in decimal nota- 
tion, because people u^ decimkl nuinbers. When ^binary, numbers are. used internally , 
the: input must t^^converted from drcimal ^ arid then 

reconverted from binary, to decimal; t If" the amount of computation is sriiall , the GPU 
may spend most of its ; time performing conversions.;^ is 
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Fig. 5-2. Addressing structure of the IBM 370's main memory. 

large, the faster speed of the binary arithmetic instructions makes the conversions 
worthwhile. Packed decimal numbers are widely used in business applications. 

A program at the conventional machine level on a 370 has access to 20 high- 
speed CPU registers used for performing arithmetic and logical operations, as well as 
for storing intermediate results. Sixteen of these are general-purpose registers of 
length 32 bits, numbered from 0 to 15. The other four registers, numbered 0, 2, 4, 
and 6, are 64 bits long and are used for floating-point arithmetic. The 370 hardware 
also contains a number of other registers, such as an instruction register, program 
status word, MAR, and MBR but they are only accessible at the microprogramming 
level. Figure 5-3 illustrates the registers used by conventional-machine-level pro- 
grams. The 370 also has 16 control registers, but these are used only by the operating 
system and will not concern us further. 

The 370 level 2 instructions are either 16, 32, or 48 bits in length and may be 
located at any even address. Nearly all instructions contain an 8-bit operation code, 
specifying which operation is to be carried out. The remaining bits are used to 
specify where the data for the instruction is located— in registers, memory, or both. 
The 370 has some general-purpose instructions used in almost all programs, some 
instructions intended primarily for scientific calculations, and some instructions pri- 
marily useful for commercial applications. The level 2 instruction set contains about 
200 instructions. A list of most of these instructions is given in Fig. 5-4. 



SEC. 5.1 



EXAMPLES OF THE CONVENTIONAL MACHINE LEVEL 



185 



General registers 
32 bits - 



1 C 



2L 



3[ 



7[ 



8[ 



9 ,c 
i.i c 

12 [ 
13[ 

i'4 r 



15 [ 



Floating point registers 
64 bits 



2f 



4L 



6[ 



Fig. 5-3. General registers and floating-point registers on the 370. 

5.1.2. DEC PDP-11 

The DEC PDP-11 series consists of a number of small to medium-sized comput- 
ers Due to their short (16-bit) word length,: they are often called minicomputers, 
although, as we mentioned earlier, the boundaries between mainframes, minicomput- 
ers; and microcomputers are highly elusive. The PDP-lls are. widely used in such 
applications as ckta communication, industrial process control, scientific experiment 
monitoring and data : collection, interactive computer graphics, and education. 
.' a PDP-11 consists of a CPU, a main memory, and various I/O devices, as 
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More complicated I/O devices require more status registers. The RL01/RL02 
disk, for example, uses a total of four registers containing a variety of fields, as 
shown in Fig. 5-39(b). Each register has a unique memory address by which the pro- 
gram can read it or store into it. More sophisticated disks can have more than a 
dozen device registers with more than 100 fields. The RL01/RL02 control register 
provides status about the controller and drives, including error reporting. It also has a 
field which the program loads with a function code to indicate the operation desired. 
The meaning of the other three registers depends on the function code loaded. For 
READ and WRITE, the other three contain the memory address to read into or write 
from, the disk address, and the word count. For SEEK, WRITE CHECK, and other 
operations, they have somewhat different meanings. 

The Z80 has explicit I/O instructions, although system designers are free to do 
I/O with memory mapping if they choose, as we did in Chap. 3. When memory map- 
ping is not used, the Z80 still has (8-bit-wide) device registers, only now they are not 
part of the memory address space. Instead, each device register has a number from 0 
to 255, called an I/O port. 

The simplest I/O instructions are IN A,(N), which copies one byte from port N to 
the A register, and OUT (N),A, which copies one byte from the A register to port N. 
The byte transferred may be either data, control information, or status information, 
depending on the port selected. Slightly more complex are IN R,(C) and OUT (C),R, 
which take the port number from the C register. The next step up are INI, IND, 
OUTI, and OUTD. All these instructions expect a memory address in HL, a count in 
B, and a port number in C First, a normal IN or OUT is done, copying a byte to or 
from the memory location pointed to by HL. Then HL is incremented or decremented 
by 1 and B is decremented by 1. The final I/O instructions are INIR, 1NDR, OTIR, 
and OTDR, which are the same as the previous four, except that they keep going until 
B = 0. In effect, each one does a block transfer to or from memory. This transfer is 
not DMA, however, because the CPU is occupied the entire time. Nevertheless, it is 
faster and takes up less space in memory than an explicitly programmed loop to do 
the same job. 



5.5. FLOW OF CONTROL 



Flow of control refers to the sequence in which instructions are executed. In gen- 
eral, successively executed instructions are fetched from consecutive memory loca- 
tions. Procedure calls cause the flow of control to be altered, stopping the procedure 
currently executing and starting the called procedure. Coroutines are related to pro- 
cedures and cause similar alterations in the flow of control. Traps and interrupts also 
cause the flow of control to be altered when special conditions occur. All these topics 
will be discussed in the following sections. 
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5,5.1. Sequential Flow of Control and Jumps 

Most instructions do not alter the flow of control. After an instruction is exe- 
cuted, the one following it in memory is fetched and executed. After each instruc- 
tion, the program counter is increased by the number of memory locations in that 
instruction. If observed over an interval of time that is long compared to the average 
instruction time, the program counter is approximately a linear function of time, 
increasing by the average instruction length per average instruction time. Stated 
another way, the dynamic order in which the processor actually executes the instruc- 
tions is the same as the order in which they appear on the program listing. 

If a program contains jumps, this simple relation between the order in which 
instructions appear in memory and the order in which they are executed is no longer 
true. When jumps are present, the program counter is no longer a monotonically 
increasing function of time, as shown in Fig. 5-40(b). As a result, it becomes diffi- 
cult to visualize the instruction execution sequence from the program listing. When 
programmers have trouble keeping track of the sequence in which the processor will 
execute the instructions, they are prone to make errors. This observation led Dijkstra 
(1968a) to write a then controversial letter entitled "GO TO Statement Considered 
Harmful," in which he suggested avoiding GO TO statements. Since that time 
languages without GO TO statements have become popular. Of course, these pro- 
grams compile down to level 2 programs that may contain many jumps, because the 
implementation of IF, WHILE, and other high-level control structures require jumping 
around. 





Time Time 



(a) (b) 
Fig. 5-40. Program counter as a function of time (smoothed), (a) Without 
jumps, (b) With jumps. 



Jumps are frequently difficult to avoid when programming in a language lacking 
structuring statements such as IF ... THEN ... ELSE and WHILE ... DO .... Relying 
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on jumps as the primary method of controlling the flow of execution makes it diffi- 
cult, if not impossible, to write error-free, well-structured programs. For this and 
other reasons, programming at the conventional machine level is becoming obsolete. 

5.5.2. Procedures 

The most important technique for structuring programs is the procedure. From 
one point of view, a procedure call alters the flow of control just as a jump does, but 
unlike the jump, when finished performing its task, it returns control to the statement 
or instruction following the call. 

However, from another point of view, a procedure body can be regarded as defin- 
ing a new instruction on a higher level. From this standpoint, a procedure call can be 
thought of as a single instruction, even though the procedure may be quite compli- 
cated. Similarly, a person programming at level 2 can certainly regard the multiplica- 
tion instruction as a single instruction, even though it is carried out by an interpreter 
running at level 1 as a large number of successive steps. 

By writing a collection of procedures, a programmer can define a new level with 
a new, larger, and more convenient instruction set. Programs for this new level con- 
sist of sequences of instructions, some of which are procedure calls and some of 
which are the original level 2 instructions. Associated with the execution of a pro- 
gram at this new level is a "virtual program counter," which points to the current 
instruction (counting a procedure execution as a single instruction) and increases 
monotonically in time. The direct correspondence between the execution sequence 
and the listing sequence makes it easy to understand what the program does. 

In Sec. 5.4.5, we mentioned recursive procedures — that is, procedures that call 
themselves. Now we will give an example of one. The "Towers of Hanoi" is an 
ancient problem that has a simple solution involving recursion. The problem requires 
three pegs, on the first of which sit a series of n concentric disks, each of which is 
smaller in diameter than the disk directly below it. The second and third pegs are ini- 
tially empty. The object is to transfer all the disks to peg 3, one disk at a time, but at 
no time may a larger disk rest on a smaller one. Figure 5-41 shows the initial confi- 
guration for n =5 disks. 

The solution of moving n disks from peg 1 to peg 3 consists first of moving 
n - 1 disks from peg 1 to peg 2, then moving 1 disk from peg 1 to peg 3, then mov- 
ing n - 1 disks from peg 2 to peg 3 (see Fig. 5-42). To solve the problem we need 
a procedure to move k disks from peg i to peg j . Whenever this procedure is called, 
by 

towers (n, i, j) 

the solution is printed out. The procedure first tests to see if n = 1. If so, the solu- 
tion is trivial, just move the one disk from itoj. If n =£ 1, the solution consists of 
three parts as discussed above, each being a recursive procedure call. 
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5.5.4. Traps 

A trap is a kind of automatic procedure call initiated by some condition caused 
by the program, usually an important but rarely occurring condition. A good example 
is overflow. On many computers, if the result of an arithmetic operation exceeds the 
largest number that can be represented, a trap occurs, meaning that the flow of control 
is switched to some fixed memory location instead of continuing in sequence. At that 
fixed location is a jump to a procedure called the overflow trap handler, which per- 
forms some appropriate action, such as printing an error message. If the result of an 
operation is within range, no trap occurs. 

The essential point about a trap is that it is initiated by some exceptional condi- 
tion caused by the program itself and detected by the hardware or microprogram. An 
alternative method of handling overflow is to have a 1-bit register that is set to 1 
whenever an overflow occurs. A programmer who wants to check for overflow must 
include an explicit "jump if overflow bit is set" instruction after every arithmetic 
instruction. Doing so would be both slow and wasteful of space. Traps save both 
time and memory compared with explicit programmer controlled checking. 

The trap may be implemented by an explicit test performed by the interpreter at 
level 1. If an overflow is detected, the trap address is loaded into the program 
counter. What is a trap at one level may be under program control at a lower level. 
Having the microprogram make the test still saves time compared to a programmer 
test, because it can be easily overlapped with something else. It also saves memory, 
because it need only occur in a few level 1 procedures, independent of how many 
arithmetic instructions occur in the main program. 

A few common conditions that can cause traps are floating-point overflow, 
floating-point underflow, integer overflow, protection violation, undefined opcode, 
stack overflow, attempt to start nonexistent I/O device, attempt to fetch a word from 
an odd-numbered address and division by zero. 

5.5.5. Interrupts 

Interrupts are changes in the flow of control caused not by the running program 
but by something else, usually related to I/O. For example, a program may instruct 
the disk to start transferring information, and set the disk up to provide an interrupt as 
soon as the transfer is finished. Like the trap, the interrupt stops the running program 
and transfers control to an interrupt handler, which performs some appropriate action. 
When finished, the interrupt handler returns control to the interrupted program. It 
must restart the interrupted process in exactly the same state it was in when the inter- 
rupt occurred, which means restoring all the internal registers to their preinterrupt 
state. 

The essential difference between traps and interrupts is this: traps are synchro- 
nous with the program and interrupts are asynchronous. If the program is rerun a 
million times with the same input, the traps will reoccur in the same place each time 
but the interrupts may vary, depending, for example, on precisely when a person at a 
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temunal pushes the carriage return key. The reason for the reproducibility of traps 
and irreproducbihty of interrupts is that traps are caused directly by the program and 
interrupts are, at best, indirectly caused by the program. 

The need for interrupts arises when input or output can proceed in parallel with 
CPU execution. On computers where the CPU issues an I/O instruction and then 
stops to wait for the I/O to be completed, there is no need for an interrupt. When the 
I/O is finished, the CPU is automatically restarted at the instruction following the I/O 
instruction. Because the CPU can generally execute many thousands of instructions 

fh?rPi^ T r ^T d 10 COmp,ete a sin S le 1/0 ^cnon, it is wasteful to force 
the CPU to be idle during this time. Interrupt schemes allow the CPU to compute 
concurrently with the I/O and be signaled as soon as the I/O is completed 

A large computer may have many I/O devices running at the same time. For 
example, it might be reading data from cards, printing results on the line printer, writ- 
ing output on a disk for future use, and plotting a graph of results on the plotter. All 
this activity can lead to complicated situations. When the card reader has finished 
reading a card, the CPU is interrupted and the card reader service procedure is begun 
The card reader service procedure must move the card just read to the main memory 
location where the CPU expects it (if it is not already there), check to see if any read- 
ing errors occurred, possibly check to see if each card column contains a valid charac- 
ter, and issue an instruction to start reading the next card. 

A nonzero probability exists that another I/O device— for example, the disk--will 
complete its I/O instruction before the reader service procedure has completed its task 
This situation can be handled in one of two ways. First, the disk can cause a CPU 
interrupt, halting execution of the card reader service procedure and starting execution* 
of the disk service procedure. Second, the disk can be forced to wait until the reader 
service procedure is finished and can then cause an interrupt. We will now examine 
these possibilities in detail. 

If we allow the disk to interrupt the card reader service procedure, we must also 
be prepared for the printer to interrupt the disk service procedure and for the plotter to 
interrupt the printer service procedure. If this interrupt sequence actually occurs it is 
necessary to decide what to do when the plotter service procedure finishes. Possibili- 
ties are to continue the printer, disk, or card reader service procedures or to continue 
the CPU program that was running when the card reader interrupt occurred. It is clear 
that the administration involved in keeping track of which procedure to run when can 
get complicated. 

One method for simplifying this administration is to require that all interrupts be 
transparent, which means that whenever an interrupt occurs, the state of the inter- 
rupted process is saved, including the program counter, registers, and condition codes 
The interrupt service procedure is then run. Finally, the state of the interrupted pro 
cess is restored to exactly the same condition it was in when the interrupt occurred 
and the process restarted. ^ 

The interrupted process neither requires any special precautions nor needs to be 
concerned with the interrupt handling. It is not even aware of its existence (unless it 
is timing something). Because the program running at the time of the interrupt is not 
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aware of the fact that it has been interrupted, stopped, and later restarted, the interrupt 
is said to be transparent (or invisible). If all interrupts are transparent, an interrupt 
procedure will not even notice if it, itself, is interrupted. 

Turning again to our earlier example, it is clear that the card reader interrupt must 
be transparent to the main program, the disk interrupt must be transparent to the card 
reader service procedure, and so on. When the plotter service procedure finally fin- 
ishes its task, the printer service procedure must be restarted, not one of the other service 

procedures. L . . 

Similarly, for the printer service procedure to be transparent to the disk service 
procedure, the latter must be restarted (when the printer service procedure is through) 
from the point where it was interrupted. In other words, the interrupt service pro- 
cedures must be restarted in the reverse order in which they occurred, as shown in 
Fig. 5-49. Whenever an interrupt service procedure completes its task, the most 
recently interrupted procedure must be restarted. 

From Fig. 5^9 we see that interrupts -are nested in time, meaning that a program 
will not be restarted until all the interrupts subsequent to it have been completely pro- 
cessed Situations involving nesting are common in computer science. All nesting 
situations have one property in common: an inner nest is always completed before the 
surrounding nest is completed. A stack can often be used to implement a nesting 
situation. Whenever a new nest is entered, the state of the computation just before 
the entry is saved on the stack. Whenever a nesi is exited, the state of the computa- 
tion just previous to entering that nest is popped off the stack and restored. 

Figure 5-50 illustrates the use of a stack for processing interrupts. The numbers l 
to 9 represent the time intervals shown in Fig. 5-49. During interval 1, no preceding 
state need be remembered, and the stack is empty. After the card reader interrupt has 
occurred, the state of the main program at the time of the interrupt must be remem- 
bered so that it can be restarted in the correct place later. This situation is shown as 
2. When the disk interrupts the card reader service procedure, the state of the card 
reader service procedure must also be stacked, shown as 3. 

Each of the nine stack configurations refers to some sequence of as yet uncom- 
pleted interrupt procedures. If another interrupt occurs at 7, while the disk service 
procedure is running, the state of that procedure will be saved again. If the computer 
possesses many I/O devices, a given interrupt service procedure may be stopped and 
resumed several times before it completes. 

The model we have just given for interrupt processing is, however, not complete 
because we have ignored the critical timing aspects of I/O devices. Some I/O devices 
must be serviced within a specific time interval or information will be lost. For 
example if data are being transmitted over a communication line at 960 
characters/sec, a character arrives in the receiver buffer every 1042 usee. If the inter- 
rupt service procedure fails to fetch the character within 1042 (xsec, it may be 
overwritten by the next one and lost. An interrupt system needs a provision for han- 
dling this kind of problem. In other words, once the service procedure for a highly 
critical interrupt has begun, it must not be interrupted by a less critical interrupt. 

When an I/O processor on the 370 finishes executing its program, it can interrupt 
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Fig. 5-50. Use of a stack for interrupt handling. Each box is the saved state 
of the indicated process. The numbers correspond to Fig. 5-49. The stack 
grows downward. 

the CPU. It does so in the following steps. The CPU has a 64-bit register called a 
program status word (PSW) that contains the program counter, condition code, inter- 
rupt code, and other pieces of status information, as illustrated in Fig. 5-51. When an 
interrupt occurs, the PSW's interrupt code is set to the number of the interrupting 
device. Then the PSW is stored in location 56 and a new PSW is loaded from loca- 
tion 120. As soon as the new PSW has been loaded, the CPU begins execution at the 
start of a general interrupt service procedure. The interrupt service procedure must 
first determine which I/O device finished by examining the interrupt code in the old 
PSW at location 56. Then it can call the appropriate service procedure. 

If a second interrupt occurred while the first one was being processed, the current 
PSW would be stored at location 56, thereby erasing the one already there. The situa- 
tion is similar to a procedure call instruction that always puts the return address in a 
fixed place in memory. To prevent having a PSW overwritten before the interrupt 
procedure has had a chance to save it, the 370 has a bit associated with each I/O pro- 
cessor called a mask bit. When this bit is a 0, interrupts from that I/O processor are 
forced to wait until it becomes a 1. Setting mask bits to zero is called disabling 
interrupts. The mask bits are located in an internal processor register. If the / bit in 
Fig. 5-51 is 0, all interrupts are disabled, no matter what values the mask bits have. 

When the PDP-11 is interrupted by an I/O device, the PSW (see Fig. 5-51) and 
program counter are pushed onto the stack, and a new PSW and program counter are 
loaded from the memory address associated with the I/O device. These memory 
addresses are called interrupt vectors and each device has a unique one. During the 
hardware interrupt sequence, the device specifies an interrupt vector by putting the 
vector's address on the UNIBUS. Each interrupt vector contains the starting address 
of the service procedure for the corresponding device, thus eliminating the need to test 
which device caused the interrupt. The 370 has only one interrupt vector for all I/O 
devices, so the interrupt handler must first determine which device wants attention. 
Only then can it call the proper service procedure. 
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R enables event recording 

D enables virtual memory 

I controls interrupt enabling 

Key is used for memory protection 

W indicates CPU is waiting for interrupt 

Mask enables underflow/overflow traps 

SHXNZVC, CC are condition codes 

Priority is CPU priority 

M, mode are user/kernel mode bits 

T bit is for tracing (trap after every instruction) 



p/v 



Z80 



Fig. 5-51. The PSW on four computers. The 370 PSW is the EC format 
and only the upper 32 bits are shown. The lower 32 bits contain eight zeros 
and the program counter. * ' 

The PDP-11 has a system of priority interrupts. Each I/O device has a priority 
number associated with it. The CPU also has a priority (from 0 to 7), which can be 
set by the program, and which is part of the PSW. If the priority of the I/O device is 
higher than the current CPU priority, the interrupt takes place; otherwise, it is forced 
to wait until the CPU priority is reduced. This feature can be used to ensure that 
time-critical interrupt service procedures can be interrupted only by still more critical 
ones, not by less critical ones. For example, if the disk interrupt service procedure 
runs at priority 5, an interrupt from magnetic tape at priority 6 can interrupt it but an 
interrupt from the paper tape reader at priority 4 will be forced to wait until the CPU 
priority is set to 3 or less, which will not occur until the critical disk service pro- 
cedure is finished. The priority of an I/O device is determined by a switch on the 
device itself. The devices that need the fastest service are naturally given the highest 
priorities. Traps use interrupt vectors, the same as true interrupts. 

The 68000's interrupt system is similar to the PDP-ll's, including the eight prior- 
ity levels. When an external device whose priority is higher than the CPU's signals 
an interrupt, the PSW and program counter are stacked and a new program counter is 
fetched from the interrupt vector. A new PSW is not fetched from memory as on the 
PDP-11 but the CPU priority is set to that of the interrupting device. This approach 
saves space in the interrupt vectors because no PSWs need be stored but means that a 
priority n device service routine can be interrupted by a priority n + 1 device before 
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the routine has been able to execute even one instruction. On the PDP-11, the new 
PSW may contain priority 7 to prevent all other interrupts for a few instructions to 
allow the routine to do some initial work without being disturbed. Afterward the rou- 
tine can lower the priority if it wants to. 

On the 370, PDP-11, and 68000 but not the Z80, the CPU is always in one of 
two (on some PDP-1 1 models, three) modes or states. The more powerful of the two 
is called kernel mode or supervisor state. The less powerful is called user mode or 
something similar. In user mode, some instructions, principally those that do I/O or 
affect the current mode, are forbidden and cause traps to kernel mode. In kernel 
mode, everything is allowed. When these machines are used for multiprogramming 
(time sharing), the user programs are forced to run in user mode to prevent them from 
interfering with each other. The operating system, in contrast, runs in kernel mode so 
that it can control the whole machine. A bit or field in the PSW determines the 
current mode. When an interrupt occurs on the 370 or PDP-1 1, the new PSW fetched 
determines which mode the interrupt routine will run in. In practice it is always ker- 
nel mode. On the 68000, no new PSW is loaded on interrupt, so the hardware always 
switches directly into kernel mode. 

The PDP-1 1 and 68000 have different hardware stack pointers for user mode and 
kernel mode. The kernel stack pointer points to the kernel stack, which is in an area 
of memory protected from user programs. When the interrupt hardware switches the 
CPU into kernel mode, it simultaneously switches stack pointers, so the program 
counter and PSW are saved on the (protected) kernel stack rather than on the user 
stack. 

The Z80 interrupt system is more primitive than that of the PDP-11 and 68000. 
Two (rather than eight) interrupt levels are present: maskable and nonmaskable. The 
maskable interrupts can be disabled by the DI instructions; the nonmaskable interrupts 
cannot be disabled. The latter are frequently used for emergencies such as shutting 
down industrial process control equipment in the few milliseconds available after an 
impending power failure has been detected. 

The interrupt sequence consists of storing the program counter on the stack and 
disabling maskable interrupts. The accumulator and flags must be saved in software 
with the PUSH AF instruction. Nonmaskable interrupts always force control to 
address 102. Three different (software selected) modes are available for maskable 
interrupts. In mode 0, the interrupting device provides the next instruction to be exe- 
cuted on the data bus. Normally, .it is RST. In mode 1, control is forced to address 
56. In mode 2, the high-order 8 bits of the interrupt service routine come from the I 
register; the device provides the low-order 8 bits on the data bus. This somewhat 
peculiar scheme is intimately related to the issue of 8080 compatibility. 



5.6. SUMMARY 

The conventional machine level is what most people think of as "machine 
language." At this level the machine has a byte- or word-oriented memory ranging 
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MULTIPLE-PROCESSOR MANAGEMENT 



7.4.4. Valid Interrupts 

The local and I/O APICs support 240 distinct vectors in the range of 16 to 255. Interrupt priority 
is implied by its vector, according to the following relationship: 

priority = vector / 16 

One is the lowest priority and 15 is the highest. Vectors 16 through 3 1 are reserved for exclusive 
use by the processor. The remaining vectors are for general use. The processor's local APIC 
includes an in-service entry and a holding entry for each priority level. To avoid losing inter- 
rupts, software should allocate no more than 2 interrupt vectors per priority. 



7.4.5. Interrupt Sources 

The local APIC can receive interrupts from the following sources: 

• Interrupt pins on the processor chip, driven by locally connected I/O devices. 

• A bus message from the I/O APIC, originated by an I/O device connected to the I/O APIC. 

• A bus message from another processor's local APIC, originated as an interprocessor 
interrupt. 

• The local APICs programmable timer or the error register, through the self-interrupt 
generating mechanism. 

• Software, through the self-interrupt generating mechanism. 

• (P6 family processors.) The performance-monitoring counters. 

The local APIC services the I/O APIC and interprocessor interrupts according to the information 
included in the bus message (such as vector, trigger type, interrupt destination, etc.). Interpreta- 
tion of the processor's interrupt pins and the timer-generated interrupts is programmable, by 
means of the local vector table (LVT). To generate an interprocessor interrupt, the source 
processor programs its interrupt. command register (ICR). The programming of the ICR causes 
generation of a corresponding interrupt bus message. See Section 7.4. 1 1., "Local Vector Table", 
und Section 7.4.12., "Interprocessor and Self-Interrupts", for detailed information on program- 
ming the LVT and ICR, respectively. 



7.4.6. Bus Arbitration Overview 

Being connected on a common bus (the APIC bus), the local and I/O APICs have to arbitrate for 
permission to send a message on the APIC bus. Logically, the APIC bus is a wired-OR connec- 
tion, enabling more than one-local APIC to send messages simultaneously. Each APICissues its 
urbitration priority at the beginning of each message, and one winner is collectively selected 
following an arbitration round. At any given time, a local APICs the arbitration priority is a 
unique value from 0 to 15. The arbitration priority of each local APIC is dynamically modified 
uftereach successfully transmitted message to preserve fairness. See Section 7.4. 16., "APIC Bus 
Arbitration Mechanism and Protocol", for a detailed discussion of bus arbitration. 
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Changed: 8/30/95 Validation TechnologyPrinted: 8/30/95 Page l/3tel Rin 
DRAFT 

Instruction Set Architecture Testing 

Reed K. Christensenm/s JF1-19, (503) 264-4619 rkc@ichips.intel.com 

1 . Introduction The main objective of the this project is to advance the technology of architectural-level 
testing of any Instruction SetArchitecture (ISA), so that it is done in a formalized and systematic 
method. A software tool will be produced that will incorporate the validation technologies developed by 
the project. The customers for this tool are validators involved in specif ication-based validation. The 
specification in this case is thelSA of the processor. In the past, creating test cases for an ISA has been 
an informal process of a validator reading the specification and then deciding ad hoc which conditions 
and combinations should be tested. Since these test cases were not sys-tematically developed, it is no 
surprise that a suite of this type is characterized as "incomplete" by the validators who use it. 

A slight diversion might be worthwhile at this point to def ine what architectural-level testing is, and 
how one would goabout doing a "complete" job of it. Architectural-level testing consists of the 
following: 

1. setting architecturally visible state (registers, memory, flags) to known values 2. executing a single 
instruction from the ISA 3. checking architecturally visible state for changes (as defined by the 
behavioral description of the instruction justexecuted) 

Obviously, this type of testing is only a small part of validating a silicon implementation of an ISA since 
it simplifies awaythe corner cases introduced by pipelining, caching, out-of-order execution, and a 
myriad of other features of a real silicon implementation. Good architectural-level testing can be thought 
of a necessary, but not sufficient condition for good overallvalidation of the processor implementation, 
is perhaps, the best place to begin the validation process (the top of the validation food-chain), since it is 
the highest abstract view of the required processor behavior. How does one do "complete" architectural- 
level testing? The model for architectural-level testing may seem simple, buteven this level of testing is 
a lar ge enough problem that complete testing does not imply exhaustive testing. For instance, even 
testing a simple register-to-register MOV instruction becomes a very large number of test cases is one 
were to exhaus-tively check that every number can be successfully moved from every register to every 
register . Typically data values are formed into equivalence classes, so that testing of one value from the 
class is considered the same as testing all values ofthe classl. Boundary values from the class are also 
usually tested. 
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With data values grouped into equivalence classes, complete architectural-level testing becomes the task 
of exploring thebehavioral paths in each instruction. By traversing a path through an instruction, a list 
of state can be built up which would cause the path to be hit when the instruction is executed. Also a 
list of state which should be altered by the instruction canbe collected while traversing the path. The 
number of paths to be traversed may be too large to be exhaustively explored in the time available. 
Some means must be provided to prune the path choices and to reduce combinations or cross-productsof 
things to try. 

1 . Random testing of values from an equivalence class is often used to test the validator's assumption of 
equivalence. 

Changed: 8/30/95 Validation Technology Printed: 8/30/95 Page 2/3tel Rin 

DRAFT 2. DaVinci Use A behavioral description of an instruction from the ISA can be written in a 
formal language, and then represented as agraph. The graph representation is a useful way of visualizing 
the behavioral paths through the instruction. Here is the description of a MOV instruction: 

(see mov.alg file) The daVinci representation provides a good visualization of the behavioral paths 
through this instruction: 

(see mov.daVinci) The software tools provided by this project use daV inci for much more than a static 
visualization of an instruction algo-rithm. The instruction graph is loaded by the dgm tool where it can 
be manipulated by primitive graph routines. These routines are available to the user to be used in a full 
(T cl) scripting environment. As the graph in memory is modif ied by thevalidator, the corresponding 
visualization graph provided by daVinci is updated though the daVinci application interface. 

The validator is also able to modify the graph in dgm memory by using the daVinci graphical interface. 
Node(s) can beselected in the daVinci window and a message will be sent from daVinci identifying 
them. Buttons are provided in a separate Xwindow which will cause the daVinci message to be 
interpreted to mean dif ferent actions, such a deleting a node,expanding a sub-graph in place of the node, 
etc. These cause modif ications to the graph in dgm memory, which is, of course, reflected back to 
daVinci for display. 

(see mov_big.daVinci for a modified version of the original mov.daVinci graph) 
jmp.graf 

Figure 1 : iATS Functional Model TclLibrary 

TkLibrary ExpectLibrary 

daVinci File Window Help 
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Guest Viewpoint: 
Is Out-of-Order Out of Date? 

IA-64's Parallel Architecture Will Improve Processor Performance 

By William S. Worley Jr., HP Labs, and Jerry Huck, IA-64 Architecture Lab {2/7/00-02} 



Microprocessors are on a relentless path to higher performance. Every innovation in 
computing — data mining, Java programming, distributed computing on the Internet, 
multimedia data streams, and so on — invariably requires greater computing power. Even 



traditional database processing and technical computing 
have increasing problem sizes that drive demand for 
higher-performance microprocessors. 

To meet these and other future requirements, new 
approaches to exploit improvements in IC processes are 
needed. Today, nearly all microprocessors exploit paral- 
lelism to accomplish more work in less time. We believe that 
parallelism can best be exploited with a computer architec- 
ture that is designed from the ground up to support 
instruction-level parallelism (ILP). We have termed this 
style of architecture EPIC, for explicitly parallel instruction - 
set computing. IA-64, developed jointly by HP and Intel, is 
such an architecture (see MPR 5/31/99-01, "IA-64: A Paral- 
lel Instruction Set"). 

IA-64 enables the compiler to express more paral- 
lelism to the machine than is possible with existing RISC or 
CISC architectures. As a result, IA-64 significantly reduces 
the hardware cost of detecting and scheduling the paral- 
lelism among instructions. The ability to specify this paral- 
lelism directly is one of IA-64's primary advantages. 

Through the 1980s, RISC and CISC architectures were 
not designed primarily for high ILP. Instead, they were 
designed to make the best use of the technology that was 
available at the time. The RS/6000 and Alpha architectures 
adopted similar computing resources and instruction set 



formats. The inability of RISC and CISC architectures to 
express parallelism directly can be overcome to some degree 
by adopting complex, nonarchitectural approaches, princi- 
pally out-of-order (OOO) dynamic superscalar hardware. 
Although preserving customer investments and supporting 
an installed system base are good business reasons for 
developing such processors for legacy architectures, the 
advent of IA-64 eliminates the performance of OOO hard- 
ware as a compelling technical reason to do so. 

The growing market requirement for a higher- 
performance 64-bit architecture and absence of existing 
Intel 64-bit binaries gave HP and Intel an opportunity to 
create something new. The companies took advantage of 
this opportunity — along with lessons learned from the past 
15-20 years of computing evolution — to create a new archi- 
tecture with performance characteristics superior to those 
of existing RISC and CISC architectures. 

Not Just for ILP 

A microprocessor that minimizes computation cycles 
makes a better building block for high-performance sys- 
tems. Such a microprocessor can, for example, be repli- 
cated to build multiprocessor systems. Fast CPUs in an 
MP configuration reduce queuing and contention, and 
they provide greater overall throughput than a larger 
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number of slower processors, a fact that we have seen 
many times in OLTP benchmarks in which small numbers 
of PA-RISC processors outperformed larger numbers of 
slower processors. 

The view that parallelism is achieved most effectively 
through higher levels of multiprogramming is unproved, 
and, to the extent that it may be true, does not provide the 
complete picture. It fails to appreciate the fact that paral- 
lelism must be improved at all levels of a system. Providing 
parallelism solely through hardware-based multithreading, 
simultaneous multithreading (SMT), or chip-level multi- 
processing (CMP) cannot compensate for the lack of paral- 
lelism in the basic processing element. This is obvious for 
single-threaded code, but it is true even for some multi- 
threaded code, such as the encryption codes mentioned later. 

SMT and CMP apply equally to RISC, CISC, and 
EPIC microprocessors. The first paper design of an EPIC 
machine at HP labs in 1991 envisioned integrated hardware 
multithreading as an orthogonal complement to EPIC 
architecture capabilities. But SMT has its downsides. The 
nonlinear nature of caches can be a problem for some 
workloads. Instead of one thread thrashing the cache on 
one processor, one thread on an SMT processor can thrash 
the cache for all threads on the processor. Finding the best 
design for multiple working sets to share a single cache is a 
research problem that has generated several papers but, as 
yet, no clear solutions. 

Effective utilization of the hardware resources of a 
modern microprocessor is difficult. Historically, we have 
found that doubling the number of function units of a RISC 
processor has resulted in less-than-linear scaling. IA-64 was 
specifically designed to utilize additional function units 
effectively. Defenders of superscalar RISC architectures 
argue that out-of-order processing is the best means to 
achieve high function-unit utilization. 

Building an out-of-order processor, however, is com- 
plex and difficult. The original PA-8000, for example, used as 
many transistors in its reorder buffer as were used in the 
entire previous-generation PA-7200 chip. Most current-gen- 
eration OOO engines are four-issue implementations, and 
our studies indicate that the complexity of these machines 
will scale quadratically for 1.5x or 2x increases in issue 
width. In contrast, the first member of the IA-64 family — 
Itanium (nee Merced) — is already a six-issue machine. 

Architecture vs. Implementation 

Architectural influences are determinative for many parts of 
an implementation, but not for all. The speed of an ALU, for 
example, is primarily a function of IC process and word 
width. The repertoire of operations that the ALU can per- 
form, however, is a second-order issue. In addition, the 
data-cache hierarchy and the memory system of RISC and 
EPIC processors are largely independent of the architecture. 
Cache and memory-system interfaces must meet the 
demands of the processor— be it RISC, CISC, or EPIC 



Criticizing the IA-64 architecture on the basis of an initial 
memory- system design, which was chosen to balance cost 
and performance, is a bit unfair. 

Some assert that memory systems can be more fully 
utilized by OOO RISC designs. Our analysis of contention, 
cache behavior, buffer queuing, processor affinity, memory 
interleaving, and other factors indicates that one can find 
better approaches to use available memory technology if 
one is willing to accept additional cost. 

The IC process, the number of registers, the number of 
register ports, the bypass network, and the number of cache 
ports are the principal factors in determining the cycle time 
of an IA-64 processor. Any RISC or CISC design faces simi- 
lar challenges. The critical path in many modern micro- 
processors, IA-64 processors included, is found in the func- 
tion units and their bypass networks. But IA-64 processors 
distinguish themselves by higher utilization of this funda- 
mental structure. As with all designs, a balance was sought 
among the clock rate, pipeline depth, and execution width. 

The first IA-64 processors will be used in high-end 
servers and workstations. Over time, designs will broaden 
out to span a wide range of markets. An HP Labs study for 
high-performance embedded controllers has confirmed 
that EPIC-like machines are exceptionally effective. 

IA-64 r s Parallelism Features 

Several IA-64 capabilities express and enhance parallel exe- 
cution. The first is predication, which reduces the number 
of encountered branches, mispredicted branches, and other 
obstacles to finding parallelism. Our studies conclude that 
the gain from branch reduction more than compensates for 
any extra instructions that might be executed (Note: Do not 
think of a single if clause, think of merging three to five dif- 
ferent basic blocks into a single, branchless, critical-path- 
limited sequence of code.). A mispredicted branch disrupts 
the pipeline. The lost opportunity can be measured as the 
width of the machine times the length in cycles of the mis- 
predict penalty. For newer RISC processors, this is often 
more than 20 instruction slots. 

IA-64 specifies a large register set (128 GP registers and 
128 FP registers), a feature that allows algorithm design to be 
fundamentally changed. Matrix operations, finite-element 
analysis, and many other technical algorithms can be 
restructured to take better advantage of the large register 
space. OOO superscalar advocates argue that their internal 
register- renaming hardware can bring to bear just as many 
register resources as IA-64. Rename registers, however, are 
not as effective as real registers for either the programmer or 
the hardware. 

Having only 32 architecturally visible registers, as most 
RISC architectures do, requires that a programmer struc- 
ture his or her code in such a manner that, at any point in 
time, the registers containing program state do not exceed 
32. For problems such as 1,024-bit RSA encryption, one can 
effectively use many more than 32 general registers and 32 
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floating-point registers. Holding just two 512-bit operands 
and one 1,024-bit intermediate result fills 32 64-bit regis- 
ters. For a RISC or CISC processor, the RSA program would 
require many housekeeping load and store instructions 
whose only productive functions are to limit the instanta- 
neous general- and floating-point-register state. Although 
the underlying hardware may have many more internal 
rename registers, the programmer has no direct means to 
use these registers to hold program state. Two of the AES- 
study codes, mentioned later, used over 60 general registers. 
Having this many architecturally visible resources, and the 
IA-64 register rotation, enables coding strategies that simply 
are not possible with an architected set of only 32 registers. 

On the hardware side, even though OOO superscalar 
implementations normally have extra internal registers, 
during every cycle they must make visible only the 32 reg- 
isters of program state. Furthermore, in the event of an 
interruption, the hardware must be prepared to lose (and 
later perhaps partially reconstruct) all the nonvisible inter- 
nal state. Thus, as is the case for the programmer and for 
the executable code, the hardware's use of additional regis- 
ter resources is handicapped by the need to maintain the 
fiction that the sole register state is that constituted by the 
32 registers. 

Most significant, IA-64 introduces a collection of fea- 
tures to deal with memory latencies, which continue to 
increase relative to processor speeds. IA-64's control and 
data speculation capabilities enable compiler-directed 
access to variables at points much earlier than they are 
needed for computation. This capability permits a greater 
degree of concurrency between executing instructions and 
memory accesses. OOO engines achieve a similar effect 
with dynamic hardware, but they are restricted to fixed 
hardware algorithms for correctly predicting the execution 
path and for triggering memory fetches. HP's analysis of 
the PA-8000 shows that the primary benefit of OOO oper- 
ation in commercial workloads lies in initiating multiple 
earlier cache misses. IA-64 enables such acceleration on an 
even broader scale. 

Involving the compiler in the process of identifying 
speculative load candidates opens a bigger window into the 
program than can practically be achieved by an OOO 
superscalar processor. Data profiling makes the compiler 
even more accurate at selecting variables for speculative 
handling. The IA-64 compiler has heuristics to control the 
degree of speculation, and the programmer has control over 
the compiler's heuristics. 

Another important feature of IA-64 is its register stack 
engine (RSE). One might consider this feature a built-in 
asynchronous hardware thread that runs when there would 
otherwise be idle memory ports. This feature reduces the 
cost of procedure calls and returns and increases the utiliza- 
tion of the register file. It is especially valuable for accelerat- 
ing call-intensive object-oriented code. The reduction in the 
time to spill and fill the general register file is significant for 



many applications. On one database benchmark, for exam- 
ple, RISC processors spend about 30% of their memory ref- 
erences for procedure entry/return housekeeping. Most of 
this overhead is eliminated by IA-64*s RSE. The IA-64 archi- 
tecture has been crafted carefully to make the hardware 
design of the RSE straightforward. The RSE does not add to 
the critical path of the machine and is a relatively small part 
of the Itanium and McKinley designs. 

Putting It All Together 

All of the IA-64 elements mentioned above combine syner- 
gistically to minimize the critical code path through a pro- 
gram. The size of the resulting program binary image may 
be larger, but IA-64's instruction stream is more linear, i.e., 
it contains fewer branches. Itanium and McKinley both 
compensate for this code growth with special mechanisms 
that efficiently deliver instructions to the processor. These 
mechanisms eliminate the effects of the increased code size 
with only modest area and design costs. 

In a paper submitted to the NIST AES3 (advanced 
encryption standard) Conference by HP Labs researchers, 
the five final AES algorithms were analyzed for both 
PA- RISC and IA-64. This study shows that IA-64 's register 
file and its wide parallel architecture are very effective. The 
full issue width of the machine was utilized by some of these 
codes. Not surprisingly, the two algorithms with the greatest 
theoretical parallelism — Twofish and Rijndael — showed the 
greatest function unit utilization. Eight of the 15 IA-64 
codes (encryption, decryption, and keying for each of the 
five finalists) used more than 32 registers. Six of the 15 
IA-64 codes had smaller code sizes than PA- RISC, due to the 
compact modulo-scheduling support in IA-64. In two cases, 
the code was more than four times smaller. Overall, the 
IA-64 code size was only 27% larger than PA- RISC, even 
though no explicit effort had been made to minimize code 
size on either architecture. (A quick look at the IA-64 code 
showed that the difference could have been reduced to the 
10% range.) 

Although these are not typical codes, they illustrate 
how RISC code compares with IA-64 code using identical 
goals and algorithms. Equivalent comparisons using com- 
piled code are not yet available, because IA-64 compilers are 
still maturing. The AES study goes into detail on mapping 
the final AES algorithms onto PA-RISC and IA-64 machines 
and architectures. Features like predication and rotation 
played important parts in reducing IA-64 critical code paths 
and exposing parallelism. In one simple example, recurrence 
was trivially handled by referencing back into the rotating- 
register region — no extra copy or unrolling was needed. 

IA-64 Compilers Find Parallelism 

IA-64 builds on proven compiler techniques to extract paral- 
lelism from applications. Many of these techniques are used 
by existing compilers to gain the best performance for today's 
OOO RISC systems. These techniques include data prefetch, 
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branch hints, loop unrolling, and profile-based path struc- 
turing, as well as other well-established optimizations. 

As an example, OOO RISC compilers generate better 
code when using profile results. On the PA-8000, reducing 
the number of taken branches, through profiling, improves 
branch prediction. This is especially valuable in large com- 
mercial codes, where branch prediction tables are not very 
effective. 

Historically, every major improvement in instruction- 
level parallelism has required the use of new code-generation 
techniques. Only with such techniques can the compiler real- 
ize significant performance gains. The PA-8000 and Alpha 
21264, for example, used new binaries to get optimal per- 
formance. The initially- hoped- for transparent performance 
gains from OOO superscalar machines have not materialized. 
Meticulous code scheduling by the compiler has proved just 
as essential for 000 engines as it will be for IA-64. 

Code profiling is only slightly more important for 
IA-64 code generation than it is for an 000 processor. 
Without profiling, an OOO engine can easily get lost down 
the wrong path, due to branch mispredictions and false 
dynamic speculation — especially in large-footprint applica- 
tions. Instructions that will not be executed are cached, and 
in-flight instructions are canceled. 

As a further example, the performance of the specFP95 
benchmark is gready improved by inserting prefetch instruc- 
tions. The 000 engine is not effective in automatically trig- 
gering the proper prefetches. OOO queues are generally not 
deep enough and do not understand which data will be 
needed. On the other hand, the compiler is able to analyze 
the data layout and trigger the best prefetches. 

Compiler writers, and those who have hand-coded for 
000 machines, talk of the frustration in understanding 
how to second-guess the limitations of the 000 hardware 
and work around them to achieve full utilization of the 
function units. The job usually boils down to trial and error. 
This process actually occurred for PA-RISC codes during 
the AES study. 

Another significant issue with 000 design is sustain- 
ing the most critical memory references in flight. To the 
000 engine, everything is equal. The IA-64 compiler, on 
the other hand, is able to locate the critical path through the 
code and to ensure that the important long-latency opera- 
tions are started first. Memory buffer and other limitations 
will always mandate executing critical path instructions 
first. As noted in an earlier paragraph, the compiler will 
issue prefetches to ensure early initiation of the most-likely 
cache misses. 

To complement the compiler's expanded ability to 
avoid cache misses, hardware resources still can be brought 
to bear in an IA-64 implementation. As an example, we have 
developed several strategies to improve the handling of 
cache misses. An in-order machine has options beyond a 
simple stall when a cache miss occurs. Running ahead with 
rollback, predictive address buffers, implicit prefetch, buddy 



prefetch, and other dynamic approaches can all be effective. 
Quantitatively, these techniques are far simpler than those 
used in OOO designs. 

Static code generation is just one aspect of producing 
efficient code. The recent announcement of the Crusoe 
processor by Transmeta hints at the benefits of dynamic code 
generation. In many venues, researchers have been examin- 
ing the significant performance improvements that can be 
achieved by dynamic measurement and tuning of code. 

HP, for example, implemented a simple mechanism 
that sampled the current instruction during the normal 
timer interrupt. If the instruction was a kernel branch, the 
branch hint was rewritten to match the actual program 
flow. This one dynamic mechanism resulted in a 5% 
improvement for database applications. 

More aggressive approaches are possible. The Itanium 
processor is able to watch a program's cache misses, and 
software can apply a simple heuristic to set prefetch hints 
correctly. This type of explicit approach, driven by runtime 
information, is extremely accurate and selective. Better per- 
formance feedback enables greater tuning accuracy and bet- 
ter utilization of the machine resources. IA-64 has hint 
fields in most branch and memory-reference instructions. 
Armed with runtime information for the executing pro- 
gram, hint fields can be rewritten on the fly to match the 
needs of the current workload. 

These approaches allow software to tune code dynam- 
ically for performance. With such techniques, and without 
recompilation, performance can continue to evolve and 
improve, even after a machine and application has been put 
into production. IA-64 provides a greater range of tuning 
options than previous architectures. 

Future Directions 

The first IA-64 microprocessor, Itanium, exhibits ILP 
beyond that achievable by any 000 superscalar micro- 
processor in existence. Extensive study and analysis went 
into the EPIC architectural ideas, and they have been found 
to work. Itanium is just the initial implementation of the 
IA-64 architecture. It delivers the powerful EPIC innova- 
tions while providing complete binary compatibility with 
IA-32 and PA-RISC. 

Future designs will be even more powerful. We are 
now at just the beginning of the IA-64 hardware and soft- 
ware implementation learning curves. As was the case for 
RISC and CISC architectures, the IA-64 architecture will 
evolve and become even more powerful. As was the case for 
RISC and CISC implementations, IA-64 implementations 
will mature and evolve through successive generations. And, 
as was the case for RISC and CISC software, the EPIC com- 
pilers will become better and better at exploiting the full 
capabilities of the architecture. 

Any initial implementation of a new architecture will 
be conservative and will concentrate on the most important 
architectural elements. Itanium and McKinley are not 
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exceptions to these rules. The initial version of the IA-64 
architecture by no means encompasses all the innovations 
and ideas developed by HP and Intel. The initial hardware 
implementations by no means embody all the techniques 
and designs envisioned by HP and Intel. Since the early 
1990s, HP Labs has been evaluating scalability for high ILP, 
multiprocessing, hardware multithreading, and high-band- 
width memory systems. HP has also considered many static 
and dynamic compilation and simulation techniques. 

IA-64 Will Deliver More ILP and Performance 

IA-64 will deliver on its promise of expressing, enhancing, 
and exploiting instruction-level parallelism to improve per- 
formance. The IA-64 architecture will not remain static or 
fixed, and successive generations of processors will each 
introduce innovations. By the time we have third- and 



fourth-generation chips, we are confident that the present 
architectural controversy will have passed, and EPIC will 
have proved its superiority. 

Bill Worley is a principal architect on two HP architec- 
tures: PA-RISC and PA Wide-Wordy the later of which became 
the basis for HP's collaboration with Intel on IA-64. In 1995, 
Bill was named a distinguished contributory the highest techni- 
cal position at HP Bill can be contacted at wor1ey@hpl.hp.com. 

Jerry Huck manages processor architecture in HP's com- 
puter products organization. These days his time is split 
between working with Intel on the IA-64 architecture and 
managing the team responsible for processor simulator and 
platform architecture definition. Jerry's team was also respon- 
sible for the evolution of the PA-RISC architecture. Jerry can 
bereachedatjerry_huck@hp.com. O 
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ABSTRACT: Two trends call into question the current 
practice of microprocessors and DRAMs being 
fabricated as different chips on different fab lines: 1) 
the gap between the speed of processor and the speed 
of DRAM is growing at 50% per year; and 2) the size 
and organization of memory on a single DRAM chip 
is becoming awkward to use in a system, yet size is 
growing at 60% per year. Intelligent RAM, or IRAM, 
merges processing and memory into a single chip to 
lower memory latency and increase memory 
bandwidth as well as to select the best memory size 
and organization for an application. In addition, 
I RAM promises savings in power and board area. 

1. INTRODUCTION 

The I RAM project is architect in g, fabricating and 
evaluating a single chip supercomputer that combines 
a configurable processor and high capacity DRAM to 
deliver vector supercomputer-style sustained floating 
point and memory performance, at vastly reduced 
power. This chip will be called "IRAM", standing for 
Intelligent RAM, since most of transistors on this 
merged chip will be devoted to memory. The goal is 
to demonstrate that a single chip with a simple 
processor and a very high bandwidth local memory 
can be faster on memory-intensive problems as well 
as be a much better match to real-time applications. 
Given that conventional machines will have separate 
chips for the processor, external cache, main 
memory, and networking, an IRAM would also be 
smaller, use less power, and be less expensive. The 
design targets the gigabit generation of DRAM, 
which offers 128 megabytes per chip. 

A second objective of this project is to design and 
prototype a multi-chip system in which processing is 
added to various levels of the storage system, called 
Intelligent Storage (I STORE). Such a system is 
suited to I/O intensive applications such as decision 
support databases, when the IRAM chip is placed 
inside each disk to eliminate the I/O bottleneck of 
centralized servers. It is also designed for large 
memory -intensive computations that are too large for 
a single chip. 

Project web page: http://iram.cs.berkeley.edu/ 

2. PROGRESS 

The IRAM group made progress on several fronts 
over the past year. 



3. CHIP DEVELOPMENT 

3.1 VIRAM Testchip 

The test chip was submitted to our industrial partner, 
LG Semicon, in November 1998. The chips were 
originally scheduled for fabrication in March, but a 
strike at LG Semicon has delayed the chips until 
June. Problems with the DRAM fab continued, and 
Hyundai purchased the DRAM portion of LG. 
Although we finally received parts, they were so 
delayed and the DRAM is known to be flawed so as 
to make the value of the test chip questionable. We 
consider this phase completed, and do not expect to 
continue any significant effort on this part of the 
project. 

3.2 Serial I/O Test Chip 

The I/O testchip was successfully completed in 1998, 
as described in the last report. The power and area 
results are considerably less than previously 
published 1 Gbit/sec serial links (400 mW and 4.4 
mm2 in a 0.5 um process.) We consider this phase 
completed, and do not expect to continue any 
significant effort on this part of the project. 

3.3 VIRAM-1 

We continued to investigate relationships with TSMC 
and IBM as our industrial partner for the final 
VIRAM-1 chip. We went forward on a partnership 
with IBM, which looked very promising from both a 
technical perspective (better die size than TSMC) and 
because of a strong commitment on their end to the 
IRAM project. We have received some of the 
technical information from IBM and have signed a 
temporary nondisclosure agreement to get additional 
information such as Spice models for the DRAM. We 
also worked in a parallel interface for VIRAM-1 to 
simplify interfacing to other devices. 

On May 1, 1999 we completed a preliminary 
management agreement with IBM and obtained data 
about the IBM DRAM macro and other chip level 
specifications that are needed for the final VIRAM-1 
design. 

The chip will contain DRAM memory, a scalar 
processing core, a floating-point co-processor, a 
vector unit with 6 64-bit pipes (which can also be 
subdivided into 32 or 16 bit lanes) and a network 
interface (referred to as parallel I/O lines in previous 
reports). The high level design of each of these 



components was completed prior to this year, with 
the exception of the network interface and a DMA 
engine; the network interface was completed and the 
DMA design is in progress. We completed the 
Verilog (RTL) models for most of the control for the 
memory pipeline and the integer and floating-point 
units. Some details that involve the TLB and 
floating-point interfaces remain to be tested. We 
have also completed the circuit design and layout of 
the integer multiplier, which is based on a modular 
design involving a 16-way repetition of a single 
circuit block. 

We have also developed a verification framework for 
testing the lower level designs and, eventually the 
chip. The ISA simulator is used as the "golden 
model" against which the other designs are tested. 
The verification framework allows for three basic 
kinds of tests. 1) Self checking provides the output 
(in terms of register/memory values) which is 
compared against the values produced by a simulator. 
2) Trace comparison is used to compare the traces of 
any two simulators. These are based on traces of the 
architecturally visible state, i.e., instructions, PC, 
registers, and values read/written to particular 
memory addresses. 3) Directed tests are used for 
testing state that is not visible from the architectural 
level and may therefore not be implemented in a 
higher-level simulator like the ISA simulator. An 
example is monitoring the cache replacement policy 
by inspecting the contents of the cache directly. 

3.4 Progress on VIRAM Architecture 

One of the major agreements was to find a partner to 
supply the MIPS scalar processor. That company was 
Sandcraft, and they will supply their next generation 
embedded MIPS processor. This is a full MIPS IV 
CPU, including floating point, TLB, and caches. 
Sandcraft has given permission for us to use their 
floating-point unit and TLB in our vector 
coprocessor, considerably reducing our tasks. 

We wrote a paper related to the motivation for 
I RAM, making a case for architecture research 
personal mobile computing, where portable devices 
are used for visual computing and personal 
communications tasks [6]. The requirements placed 
on the processor in this environment are energy 
efficiency, high performance for multimedia and DSP 
functions, and area efficient, scalable designs. We 
examined the architectures that were recently 
proposed for billion transistor microprocessors, and 
although they are very promising for the stationary 
desktop and server workloads, we discover that most 
of them are unable to meet the challenges of the new 
environment and provide the necessary enhancements 
for multimedia applications running on portable 
devices. 



3.5 VIRAM Instruction Set Architecture 

Based on simulation results from last Fall, we added 
new instructions to the IRAM ISA to improve the 
speed of reduction operations. In the original ISA, 
reductions were done with an "extract" instruction 
that took a set of values in one vector register and 
moved them to another vector register. The 
instruction was fairly general and could involve inter- 
lane communication, which made it difficult to 
determine which instance of the instructions could be 
chained. (In the VIRAM- 1 implementation, for 
example, there are 4 lanes, and the extract instruction 
required a general crossbar style communication 
between them.) As a result, the instruction was slow. 
In the new ISA, there is an extract instruction that 
moves the upper half of one vector register to another 
and is only defined for power-of-2 vector lengths. As 
a result, there is no inter-lane communication as long 
as the vector length is greater than the number of 
lanes, which makes the chaining decision easy to 
implement. In a reduction operation on an 
implementation with 4 lanes, only the last two extract 
will involve inter-lane communication. These new 
instructions were added to the ISA simulator, and 
after the performance model is modified to match 
expected performance, we will rerun some of the 
reduction-intensive benchmarks. 

A second issue on which progress continue is the 
exception handling model. Because we are using a 
floating-point core from Sandcraft in both the RISC 
scalar processor and the vector co -processor, some of 
the exception handling is out of our control. We have 
determined that infinities and NANs will propagate, 
which is a good match to the vector processing 
model, but some other issues such as denormalized 
numbers have yet to be resolved. 

3.6 Functional ISA Simulator 

The functional ISA simulator was updated to reflect 
some new instructions, including the extract 
described above. In addition, we built a debugging 
version of the simulator to aid in debugging assembly 
language programs. The debugger allows 
programmers to set breakpoints and examine the 
architectural state, such as register values. This tool 
has proven invaluable in developing benchmarks. 

3.7 Performance Simulator 

Graduate students in the graduate architecture course 
used the performance simulator during Fall 1998. 
Based in part on feedback from those users, a 
significant redesign of the system was done to make 
it more extensible and allow for further investigations 
of the implementation parameters. Several new 
internal releases were done. In addition, some output 
has been added to display the internal pipeline states, 
which is critical in helping find performance bugs in 
assembly language code. By visualizing the pipeline, 



one is able to determine the cause of stalls, such as 
memory bank conflicts, structural hazards, TLB 
misses, and gives the programmer some idea of how 
the instructions might be rearranged to improve 
performance. 

4. SOFTWARE TOOLS 

4.1 V-IRAM Compiler 

A former member of the SGI/Cray compiler team, 
Dave Judd, recently joined the IRAM project to work 
on a port of the Cray compiler to the VI RAM 
architecture. He began studying the current code 
generator (which was been replaced) and learning 
about related compiler efforts within SGI/Cray. 
Unlike the ISUIF compiler, there was no MIPs 
backend for the scalar code, so the first milestone 
was a complete MIPs backend. The next step will be 
to add vector instructions to the backend; the 
compiler already performs automatic vectorization, 
including outer loop vectorization which is important 
for many applications, so the main problems are to 
modify code generation and change instruction 
scheduling to match the performance characteristics 
of VIRAM-L 

4.2 Benchmarks and Applications 

We continued working on the two application efforts 
in speech and video processing. For the speech 
application, we are working with the current Cray 
vectorizing compiler on an existing Cray machine. 
Although we cannot get detailed performance 
information about VIRAM, this development effort 
will give us basic information such as how to 
annotate loops and how well the overall application 
vectorizes. 

Our second application in video processing uses 
some of the multimedia features of IRAM, which are 
unlikely to be supported by the Cray compiler, so 
some of the kernels are being written directly in 
VIRAM assembler. We have compiled the modified 
H.263 and MPEG-2 source code using a MIPS scalar 
compiler and run both on the IRAM simulator. 
(H.263 is another standard of video compression used 
often in video conferencing.) We are in the process of 
replacing the key computations with hand-vectorized 
kernels, starting with the Square of Absolute 
Difference (SAD) used in motion estimation. We 
implemented SAD in VIRAM assembly language 
and have started a performance simulation study of 
different algorithms. We plan to complete this study, 
taking advantage of the new extract instruction for 
reductions. 

At a higher level, we also explored different motion- 
estimation algorithms (all of which use SAD) for 
IRAM. Most of the work was concentrated on three 
methods: Exhaustive Search, searching through all 



the macro-blocks; Three-step Search, limiting the 
search of 

macro-blocks in a hierarchical fashion; 2D Log 
Search, the same as three-step search but with 
somewhat less computation. In addition to motion- 
estimation, different schemes to speed up the DCT 
have been investigated. The first one is the Zero 
Coefficient Prediction Scheme: if the coefficient of a 
macro blocks are all zeros then there is no need to 
compute DCT for that block. In addition, different 
types of fast DCT algorithms were considered. Using 
rough timing estimates, all of these algorithms have 
roughly the same performance on VIRAM- 1, so more 
investigation will be needed using the performance 
simulator. 

In addition to these long-term application efforts, we 
developed some additional benchmarks written 
directly in VIRAM assembler. The new benchmarks 
include reduction operations using the new 
instructions, 3 basic image operation from 
multimedia (Chroma -Key, image composition, and 
color conversion), convolutions, FFTs, and matrix- 
vector multiplication, (The latter three had been 
written using the Vic tool, but were recoded in the 
new ISA by hand.) In addition, there are some 
benchmarks, such as encryption, that are available 
from ISUIF. So far, these benchmarks have been 
especially useful at helping to make the simulators 
most robust and usable. 

5. ISTORE 

In two years ago we have found another application 
of IRAM, which has been so interesting that it has 
taken something of a life of its own. ISTORE is a 
server architecture with goals quite different from 
prevailing wisdom: 

1) Maintainability : ISTORE is intended to have a 
low cost of ownership, which implies extensive 
monitoring to discover errors, a physical design 
that makes repair easy and obvious, the ability to 
insert faults to test monitoring features, and so 
on. 

2) Availability : We believe that a highly available 
system will also be easier to maintain, as it can 
decouple the failure of a component from the 
need for a person to fix the machine 
immediately. Although general solutions from 
companies like Tandem work well (mirroring, 
process pairs, and so on), we are looking for 
more economical solutions. Our hope is that by 
tailoring to the software to a single application 
we can simplify the complexity of the solutions. 

3) Evolutionary Growth : We want a system that can 
scale well beyond sizes of today's servers, to 
thousands or even tens of thousands of disks. As 
ISTORE leverages cluster technology, we expect 
to have multiple generations of hardware over 
time, and hence an emphasis on heterogeneous 




systems that evolve over time. Evolutionary 
growth is in contrast to traditional definition of 
scalability, which only promises the ability to 
construct different-sized homogeneous systems. 
We abbreviate these three goals as AME , 

The insight is that I RAM offers a such small, low 
power computer that it could be included in a canister 
with a disk at no practical increase in the size of the 
canister or in the cooling requirements of the canister. 
Then rather than use the SCSI interface of the disk to 
connect the disk to the backplane, the serial lines of 
IRAM would be connected to redundant network 
interfaces, which would be the connection of the 
canister to the backplane. The backplane would then 
be based on single -chip crossbar switches to offer 
high bandwidth connections between ISTORE nodes. 

By having a computer in every I/O building block, 
we can monitor the behavior of every disk, and 
provide new features such as fault insertion (to test 
behavior of operating system software often activated 
only in failures in the field) and fast failure (to ensure 
that a suspect device is shut down immediately rather 
than continue to sporadically make errors.) 

5.1 ISTORE Hardware 

The IRAM group is moving forward with plans to 
build a large system for prototyping ISTORE. This 
will be a 80-node system designed for high reliability 
as well as performance; each node is physically 
packaged as a "brick." The node processor is the 
Intel Mobil Pentium II operating at 366 MHz. 
Memory is incorporated using small outline DIMMs 
(SODIMMs) with a target capacity of 128 MB/node. 
Both of these technologies, originally developed for 
portable PCs, are well suited to the space and power 
constraints of the brick. 

For inter-node connectivity, we are including four 
100 Mb/s Ethernet ports on each brick. Three ports 
(per brick) will connect to a two-level network built 
from commercial Ethernet switches. Our design for 
this network includes 3-way redundancy at the first 
switch level and 2-way at the second level. A low 
profile SCSI disk drive (target: 18GB) and SCSI 
controller chip will be included in each brick. The 
design will also include the various chips required to 
make each brick "PC compatible", simplifying the 
software -porting task. 

To perform realtime monitoring of the node 
processor, we are including a second embedded 
diagnostic processor with its own communication 
network. The processor we have chosen is in the 
Motorola 68K family and includes (on-chip) a 
protocol controller for the CAN (Controller Area 
Network) network. This network, originally 
developed for automobiles, is a good match to this 



application. The diagnostic processor will monitor 
environmental conditions (temperature and voltage), 
communicate with the node processor and also allow 
us to perform some "intrusion" experiments. 

Construction of the ISTORE-1 prototype, targeted to 
be an 80-node system, continued. Anigma, a small 
company in southern California, is designing the 
boards for the nodes. We received the first version of 
these from an initial "test run" of 10 boards. Anigma 
found a glitch in the SCSI interface, which they plan 
to fix over the next month, and in the mean time we 
will be testing the other parts of the board. In 
particular, the board contains a diagnostic processor 
and set of sensors, which were designed at Berkeley. 
To utilize the diagnostic processor, we are building a 
simple custom OS, which uses a round-robin 
scheduler to support multiple kernel threads and 
processes. Critical error conditions, including 
upcoming power loss, are handled asynchronously, 
so a priority scheduler is not necessary (eliminating 
priority inversion as a source of starvation). 
Processes will be downloaded at run time, and will 
communicate through the attached serial and network 
links. They will also provide failure -resistant 
intelligence, such as demanding that the OS shut 
down the brick if multiple neighboring nodes are 
overheating. The OS will also use a portion of its 
battery -backed SRAM as a time -stamped log, which 
will durably record error conditions. 

5.2 ISTORE Software 

Since our last progress report we have refined our 
vision of the ISTORE software architecture. The 
ISTORE software will provide a comprehensive 
framework for constructing * introspective* 
applications, i.e., adaptive software that leverages 
continuous self-monitoring. The ISTORE runtime 
system will automate the collection of monitoring 
data from the hardware bricks and will provide 
applications with customized, application-specific 
views of that data; it will then allow applications to 
define triggers over those views that automatically 
invoke adaptation routines when application-specific 
conditions are met. For common adaptation goals, the 
ISTORE system software will go further by 
providing an extensible mechanism for automatically 
generating monitoring and adaptation code based on 
application policy specifications expressed as 
constraints in declarative, domain -specific languages. 
Through this mechanism, an application designer will 
obtain the full benefits of an application-tailored 
introspective runtime system without having to write 
large amounts of code and without having to resort to 
ad-hoc techniques. 

As a first step towards understanding how application 
behavior is affected by low- level system operation 
and how the system might adapt its low-level 




behavior to meet application performance and 
reliability goals, we are instrumenting a single-node 
operating system to collect detailed realtime system 
statistics that will then be analyzed off-line using data 
mining software. This will help to guide us in 
determining what system statistics should be 
monitored to drive adaptation, and in understanding 
how database technology, particularly embedded 
databases like Berkeley DB, can be used to simplify 
the organization and processing of the statistics 
gathered. 

We have continued work on performance robustness, 
to ensure that the performance of the system degrades 
only gradually as components fail or slow down. For 
example, as disks become fragmented, the observed 
performance drops, resulting in a kind of 
performance heterogeneity in the system. For both 
performance robustness and for our work on A ME 
characteristics, we have identified several small 
application projects to be completed in the next two 
quarters. The first is a web server that uses the 
Apache software running on a modified version of 
the Free BSD OS. The second is a mail server; we 
are currently exploring two different system designs, 
one involving the Petal software described in a 
previous report. The third application is database 
kernels, especially for decision support workloads 
including data mining. 

Unlike the mail and web services, the data mining 
and related workloads involve a set of closely 
coordinated processes that will run on each of the 
I STORE nodes. For programming these applications, 
we want a system that helps programmers avoid the 
kinds of programming errors, such as uninitialized 
pointer dereferences, memory leaks, and out-of- 
bounds accesses on buffers that lead to both security 
and robustness problems. We are therefore looking 
at a dialect of Java, called Titanium, with extensions 
for high performance I/O. Previously, Titanium's 
runtime support for file I/O was limited to the 
classes present in Java, which have proven too 
inefficient to meet the demands of I/O-intensive 
applications. We have developed a new library that 
adds support for bulk I/O in both synchronous and 
asynchronous forms. The synchronous model is 
simpler to use, but the asynchronous model allows 
for overlapped I/O. These libraries remove much of 
the overhead associated with the Java I/O libraries, 
which require the programmer to read arrays a single 
element at a time, but ensure the same safety 
guarantees that Java provides. Our performance 
shows 1-2 order of magnitude improvement over the 
existing I/O facilities. 

Benchmarks have historically played a key role in 
guiding the progress of computer science systems 
research and development, but have traditionally 



neglected the areas of maintainability, availability, 
and evolutionary growth. These three areas, which 
we refer to as AME, have recently become critically 
important in high-end system design. As a first step 
in addressing this deficiency, we introduced a general 
methodology for benchmarking the availability of 
computer systems. Our methodology uses fault 
injection to provoke situations where availability may 
be compromised, leverages existing performance 
benchmarks for workload generation and data 
collection, and can produce results in both detail-rich 
graphical presentations or in distilled numerical 
summaries. We apply the methodology to measure 
the availability of the software RAID systems 
shipped with Linux and Windows 2000 Server, and 
find that the methodology is powerful enough to not 
only quantify the impact of various failure conditions 
on the availability of these systems, but also to 
unearth their design philosophies with respect to 
transient errors and recovery policy. Finally, we show 
how these availability benchmarks form a possible 
foundation for the construction of general 
maintainability benchmarks. This work is described 
in a paper to be presented at Usenix 2000 [5], 

6. PROJECT MEETINGS 

In October we had a meeting with representatives 
from Atlantic Aerospace and Boeing to discuss the 
benchmarking effort for the Data- Intensive 
computing program. 

We held a joint project review retreat with the 
Berkeley Reconfigurable Systems group (BRASS) - 
January 14-16, 2000. Students working on the project 
and on research-related class projects presented their 
working results for our industry affiliates. The 
representatives of each company were as follows: 
David Anderson (Seagate Technology), ICrste 
Asanovic (MIT), Bill Athas (USC / I SI), Mike 
Beunder (Silicon Access Technology), Ray Chen, 
(Network Appliance), Jack Choquette (Sandcraft), 
Jason Golbus (Myricom), James Hamilton 
(Microsoft), Naohiko Irie (Hitachi), David Kiefer 
(Silicon Graphics), Jongbok Lee (LG Semicon), 
Mike McGregor (Micron Technology), Todd Merritt 
(Micron Technology), Steve Scott (Silicon Graphics), 
Harvey Stiegler (Texas Instruments, Inc.), Jack 
Veenstra (Sandcraft), Sanjay Vishin (Avaj), Hing 
Wong (Silicon Access), Mike Ziegler (Hewlett- 
Packard Company) 

7. CONCLUSION 

This project made significant progress on all aspects 
of the project, from chip design through applications 
software, and continued work on application so 
important that it has almost taken a life of its own. 
We hope the project will continue for the next several 
years. 



<3 



8. PUBLICATIONS 

[1] Aaron Brown, David Oppenheimer, Kimberly 
Keeton, Randi Thomas, John Kubiatowicz and David 
A. Patterson, "I STORE: Introspective Storage for 
Data-Intensive Network Services." Presented at 
HotOS-VII, 1999. 

[2] R. Arpaci-Dusseau, E. Anderson, N. Treuhaft, D. 
Culler, J. Hellerstein, D. Patterson, K. Yelick, 
"Cluster I/O with River: Making the Fast Case 
Common." IOPADS '99. 

[3] Randi Thomas and Katherine Yelick, "Efficient 
FFTs on IRAM." Presented at the 1st Workshop on 
Media Processors and DSPs. 

[4] T. Nguyen and A. Zakhor and K. Yelick, 
"Performance Analysis of an H.263 Video Encoder 
on VIRAM." Submitted to ICIP 2000. An earlier 
version of this appeared as a MS Report by the first 
author in December 2000. 

[5] A. Brown and D. Patterson. "Towards 
Maintainability, Availability, and Growth 
Benchmarks: A Case Study of Software RAID 
Systems," To appear in Usenix 2000. 

[6] D. Bonachea. "Asynchronous Bulk File I/O in 
Titanium, a High Performance SPMD Java 
Dialect" Submitted to JavaGrande 2000. 

[7] Richard M. Fromm, "Vector IRAM Memory 
Performance for Image Access Patterns." Master's 
Report, University of California, Berkeley, Computer 
Science Division, October 1999. 

[8] Christoforos Kozyrakis, "Media- Enhanced Vector 
Architecture for Embedded Memory Systems." 
Master's Report, University of California, Berkeley, 
Computer Science Division, July 1999. Appeared as 
Technical Report UCB//CSD-99-1059. 




PATENT 



DOCKET NO. 5231.16-4004C 



IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 



Splicant: 
(rial No.: 



John S. Yates, Jr., et al. 

09/429,094 Art Unit: 2155 

October 28, 1999 Examiner: David Eng 

SIDE TABLES ANNOTATING AN INSTRUCTION STREAM 



Title: 



INFORMATION DISCLOSURE STATEMENT 



COMMISSIONER FOR PATENTS 
Washington, D.C. 20231 

In accordance with 37 C.F.R. §§1 .56, 1.97 and 1 .98, Applicant wishes to make of record the 
enclosed items, as listed on the accompanying Form PTO-1449. Applicant respectfully requests the 
Examiner to fully consider the items and independently ascertain their teaching before issuance of the 
next action, and to make them of record in the file. The Examiner is also requested to initial and return a 
copy of the enclosed Form PTO-1449 to evidence such consideration. 

Applicant has listed publication dates on the attached Form PTO-1449 based on information 
presently available to the undersigned. However, the listed publication dates should not be construed as 
an admission that the information was actually published on the date indicated. Applicant reserves the 
right to establish the patentability of the claims over any information provided herewith, and/or to prove 
that this information may not be prior art, and/or to prove that this information may not be enabling for 
the teachings purportedly offered. This Information Disclosure Statement should not be construed as a 
representation that information more material to the examination of this application does not exist. 

This is a resubmission of a reference previously brought to the Examiner's attention in IDS's 
filed November 21, 2000, August 23, 2001 and February 4, 2002. No fee is due for this Information 
Disclosure Statement because this reference was previously filed in this application with the Information 
Disclosure Statement of November 21, 2000, for which the proper fee was paid. 

The Commissioner is hereby authorized to charge any additional fees that may be required for 
this Information Disclosure Statement, or credit any overpayment, to Deposit Account 50-0675, Order 
No. 5231.16-4004C. 



Respectfully submitted, 



SCHULTE ROTH & ZABEL 



Dated: July 8, 2002 




Registration No. 36,461 



SCHULTE ROTH & ZABEL 

919 Third Avenue 

New York, New York 10022 

(212) 756-2000 

(212) 593-5955 Facsimile 



CORRESPONDENCE ADDRESS: 



I certify that this correspondence, along with any documents 
referred to therein, is being deposited with the United States 
Postal Service on July 8, 2002 as First Class Mail in an 
envelope with sufficient postage addressed to The 




Information Disclosure Statement 
9210792.3 



1 



09/429,094 





o 

> 
O 



It took Transmeta engineers $100 million, f ive years 
of secret toil, and a little magic to create fast 
low-power chips that turn into x86s in a microsecond 
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TRANSMETA CORP.'S CRUSOE CHIPS, due 
to ship in May or June, look nothing like Intel 
Corp.'s Pentium processors. In fact, they do not 
even have a logic gate in common. They are 
smaller, consume between one-third and one- 
30th the power (depending on the application), 
and implement none of the same instructions 
in hardware. 

But the Crusoe microprocessors [Fig. I ] can 
run the same software that runs on IBM PC-com- 
patible personal computers with Pentium chips — 
for instance, Microsoft Windows or versions of 
Unix, along with their software applications. 

That's the magic trick. And it took a bunch of 
engineering magicians— -and over $100 million 
of venture capital — to pull it off. 

Transmeta's magic show started more than 
five years ago. David Ditzel, then the chief 
technical officer of Sun Microsystems Inc.'s 
Sparc business, headquartered in Palo Alto, 
Calif., had studied ways to assist Sparc proces- 
sors in running x86 software by emulation. He 
hired Colin Hunter as a short-term contractor 
on a project to determine what new instructions 
might be added to Sparc to help make emula- 
tion run faster. They completed the project and 
produced an internal report. But it appeared 
unlikely that merely adding a few new instruc- 
tions to Sparc would significantly enhance the 
processor's ability to run x86 software. 

Ditzel had also become concerned about the 
ever-growing complexity of microprocessor 
design. He had long been a champion of simple 
microprocessors: with a professor from the 



University of California at Berkeley, David 
Patterson, he had coauthored the pioneering 
1980 paper 'The Case for the Reduced 
Instruction Set Computer." But as time went on, 
he told IEEE Spectrum, more and more functions 
got piled into RISC chips. 

This complexity meant that RISC chips were 
getting bigger and hotter and were taking much 
longer to design and debug, and improvements 
in performance were limited. Some chip designs 
were so complex, in fact, that hundreds of engi- 
neers were needed for one design team. Looking 
out 10 years into the future, Ditzel thought 
things would only get worse. 

So, in early March 1995, he quit his job at 
Sun. Within a few weeks, he had an idea worked 
out for a new type of microprocessor. The new 
device would be fast and simple, and although 
it would bear no resemblance to an x86 proces- 
sor, it would be surrounded by a layer of soft- 
ware that could transform, on the fly, an x86 pro- 
gram into code that the simple microprocessor 
could understand. The technique, called dynamic 
binary translation, gives programs the impres- 
sion that they are running on an x86 machine. 

Ditzel called on Colin Hunter again and the 
two prepared to file papers to incorporate as a 
company. But first they needed a name, one that 
would not give away what they were doing and 
one not already taken by any of the other numer- 
ous technology companies in California. After 
running various combinations of high-tech sound- 
ing syllables past the California Secretary of State's 
office, they found one that was available — Trans- 
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[1] The highest-performance Crusoe chip, the 
TM5400, is 73 mm 2 in area. It contains a 128KB 
level one (L1) cache and a 256KB level two (L2) 
cache, on-chip LongRun power control, an 
integrated peripheral-component-interface bus 
and double- and standard-data-rate dynamic 
RAM controllers. 

The chips were fabricated by IBM Corp. at 
its foundry in Burlington, Vt. They feature five 
levels of copper interconnect and a minimum 
feature size of 0.18 um. 
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[2] Very-long-instruction word (VLIW) architecture forms the basis of the Crusoe chips. It con- 
sists primarily of working general-purpose and floating-point registers and their shadow reg- 
isters, the executions units, memory, and memory control circuits. The chips have a built-in north 
bridge, and the TM5400 features LongRun power-management circuitry. 



atoms that can be processed together, 
and the processor executes them. 

Another early breakthrough was 
understanding the factors that tradi- 
tionally made emulation slow and 
developing alternatives to eliminate 
these obstacles. A key reason for the 
sluggish performance was the extra 
instructions that an emulator has to run 
_tQ_mitch.the-exact-stcite of-a pr o ces s- 
sor in a different architecture. "In tra- 
ditional emulation," Laird told Spectrum, 
"you are taking a program written for a 
processor with one architecture and 
getting it to run on a processor with a 
different one, and the states of the two 
processors are not the same." 

For instance, an x86 program may 
expect a processor to set a condition 
code, and the program performs a 
branch operation based on the value of 
that condition code. But when the pro- 
gram is run on a PowerPC, say, the con- 
dition code is not generated in the same 
way that an x86 processor would have 
generated it. So the emulator has to go 
through a number of PowerPC instruc- 
tions to set the condition code in the 
same way as the x86. 

"What we discovered," said Laird, 
"was that if you can facilitate imple- 
menting the state of the first proces- 
sor in the second one by designing 
certain registers to hold that state, the 
emulation software doesn't have as big 
an overhead." 

Another difficulty about emulation 
has to do with so-called exceptions, 
which are caused by processor faults, 
errors, traps, or other exceptional 
events. Since exceptions halt the exe- 
cution of a program, the operating sys- 
tem must find the cause of the excep- 
tion and re-execute the instructions that 
faulted in a way that isolates the fault. 
The question of how to deal with 
exceptions was brought up early in the 
design process. It was Cmelik who 
identified the seriousness of the prob- 
lem — not solving it would mean a 
dead-end for the technological ap- 
proach being taken. 

The problem arises, explained 
Laird, because the VLIW program 
they created reorders the x86 instruc- 
tions. So if the x86 program creates 
a fault, such as a divide-by-zero — 
although it may happen infrequently, 
it still may happen — the processor has 
to be able to create the exact same 
state as any other x86 processor 
would, and hand it off to the operat- 
ing system to deal with the fault. 

The solution came several weeks 
later with a novel hardware/software 
combination called commit and roll- 
back, which, according to Wing, "is 
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really the fundamentally different thing 
about our machine." 

Commit and rollback was implemented 
by creating an extra set of registers, called 
shadow registers, in addition to the work- 
ing registers. With the execution of a soft- 
ware commit instruction, the shadow reg- 
isters duplicate the data in those working 
registers. As the operation progresses, the 
.working registers are updated by eachxom- 
putational operation. But the shadow regis- 
ters are not updated until the processor 
receives an all-clear signal in the form of 
another commit instruction, indicating that 
no exception occurred. 

When the processor hits a fault, Trans- 
meta's software issues a rollback instruction, 
and the information in the shadow registers 
is copied back into the working registers. 
"So we can reverse the execution," said 
Laird. "You come to a state, say, 'Oops, I did 
a bad thing,' go back in time instantly in one 
cycle, and start again." The next time 
around, the software schedules the opera- 
tions more conservatively, say, by execut- 
ing the instructions in precisely the same 
order as the original x86 program. 

The team realized that, in the case of a 
rollback, data to be stored in memory would 
also have to be rolled back. They came up 
with a circuit called a gated store buffer to 
keep track of the stores between commit 
points. If an exception occurs in this period, 
the system can instantly roll back to the pre- 
vious state and discard those stores. 

The gated store buffer has a committed 
and an uncommitted side with a "gate" in 
the middle. After some compilation creates 
the data to be stored, the data goes to the 
uncommitted side of the buffer. After a com- 
mit instruction, the gate opens and the data 
on the uncommitted side moves to the com- 
mitted side and is then stored in memory. 

This process may involve a substantial 
amount of data. A single x86 instruction, for 
example, can modify 1 30 bytes of memory. 
Other superscalar microprocessors also need 
store buffers, but nothing quite so big. 

IT'S CODE MORPHING! 

While development of the chip archi- 
tecture was progressing, it was beginning to 
look as if the group might never get funded. 
Group members kept explaining to venture 
capitalists that with their revolutionary soft- 
ware-based microprocessor, they could 
attack markets previously owned by x86 
chips, but no one bit. By the end of the sum- 
mer of 1995, Ditzel and Hunter had pitched 
nearly 30 venture capitalists,- Laird often 
went along as an observer. 

'They just didn't get it," Laird said. 'Dave 
[Ditzel] would start talking about dynamic 
binary translation, and their eyes would just 
glaze over. We were pumped up, saying this 
is a great idea, it is a new microprocessor, 
and nobody has ever done it this way, but 
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we couldve been from Mars for all they 
cared. We were just getting too technical." 

"It was a hard sell," Ditzel told Spectrum. 
'We were saying we wanted to do hard core 
R&D and develop this big new idea and it 
would take four years. And the venture cap- 
italists would say, 'Couldn't you just have a 
simple idea you could do in six months/ " 

So in midsummer the entire team sat 
down at their new offices-LnJLedwoocL 
Shores to figure out another way to pitch 
their ideas. They concluded that they 
needed to sum up the essence of what they 
were doing in a word or two, a simple, 
catchy name that the venture capitalists 
would understand. After tossing around sev- 
eral ideas, Cmelik threw out the term "Code 
Morphing" and they knew they had it. 

They also discarded some of their more 
technical PowerPoint slides and came up 
with a simple sketch of their concept, which 
they called the amoeba [Fig 3]. The amoeba 
explained how a traditional microprocessor 
was, in their design, to be divided up into 
hardware and software. 

Ditzel went back to the venture capital 
community with the new pitch. Laird sat on 
the sidelines with his watch. "I timed how 
long it took, from the first time Dave said 
Code Morphing, to the time the venture cap- 
italists started using the word themselves," 
Laird said. 'It was less than 5 minutes." 

Within a few weeks, several venture cap- 
ital firms were competing to fund the group. 
By October they had commitments from 
Institutional Venture Partners, Menlo Park, 
Calif., and Walden Group, San Francisco. 
The check for $3.5 million arrived in 
December 1995. 

"We hadn't changed the principles, we 
hadn't changed who we are, we hadn't 
changed anything except how we presented 
it," Laird said. "We said -Code Morphing 
software' and snap, we got funding." 

Since trademarked, the buzzword aptly 
describes what the software does: it takes 
x86 instructions and recompiles them on 
the fly into VLIW instructions. As it recom- 
piles them, it optimizes them, making them 
run, in many cases, more efficiently than 
the original x86 code. What happens with 
x86 applications is that, in the rush to mar- 
ket faced by software writers, often appli- 
cations are compiled without the highest 
levels of compiler optimization to facilitate 
debugging. Once the software works, it is 
shipped,- there is no time in the schedule to 
go back and recompile and re-test, mean- 
ing that many software applications have 
room for improvement. 

On a typical software application pro- 
gram, such as Microsoft Word, Code 
Morphing works like this: it starts with the 
x86 binary code for a program section to, 
for example, edit text. In real time, the code 
goes into Transmeta's software and comes 
out the other side transformed into VLIW 



code. In the software's sequence of opera- 
tions, the x86 instructions are first translated 
into a sequence of VLIW atoms. Then an 
optimizer, using some new and some well- 
known compiler techniques, checks to see 
if the code can be improved — for instance, 
by the elimination of redundant atoms. 

Finally, a scheduler reorders the atoms 
and groups them into molecules [Fig. 4]. 
- Once-translated, the VUW-code-is stored — 
in a special part of memory, accessible 
only by the Code Morphing software, so 
that particular program need not be trans- 
lated again. 

But that is not the end The new software 
continues to monitor how an application is 
being used. If it finds that a user is spending 
a lot of time changing the font, for instance, 
it turns on more levels of optimization to 
make that part of the program run faster. 
"We only optimize that portion of the code 
[being used]," explained Laird. "For the 
things that are executed infrequently, there 
is no reason to put in that overhead." 

One of the challenges of creating the 
Code Morphing software was to make the 
Crusoe processor, in many cases, bug- 
compatible with the x86 so that it would 
generate the so-called Blue Screen of 
Death at many of the same times an x86 
processor would. 

A REAL COMPANY 

Now that the funding was in place, it was 
time in late 1995 to build this small team of 
engineers into a real company and actually 
implement the new microprocessor archi- 
tecture on a chip. 

The design the team came up with con- 
tained only about half the logic transistors 
of an x86 processor. It included five exe- 
cution units — two arithmetic-logic, a load/ 
store, a branch, and a floating-point — and 
it could execute four instructions in a 
cycle. Sixty-four general -purpose and 32 
floating-point working registers were 
shadowed by 48 general -purpose and 16 
floating-point registers. Memory, memory 
management, and the so-called north 
bridge (usually a separate IC) rounded out 
the design. 

Even more important was what the 
design did not include. It had no superscalar 
decode, grouping, or issue logic. It had no 
register renaming or segmentation hard- 
ware. And it had no floating-point stack 
hardware. Nor did it have memory man- 
agement in the front end of the machine. It 
also had less interlock and bypassing logic 
than a traditional central processing unit. 
This structure contributed to a simpler 
design with far fewer transistors, which was 
the key to low power. 

In late 1995, Transmeta started hiring 
engineers to join the eight founders and 
begin mapping out details of the architec- 
ture. The first few hires were people whom 
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[3] The Transmeta founders credit a simplified sketch of their proposed architecture, which 
they called the "amoeba," with convincing the financial community that their idea could 
work. In this concept the x86 architecture is an ill-defined amoeba containing such fea- 
tures as segmentation, ASCII arithmetic, and variable-length instructions; the square inside 
the blob is the proposed VLIW processor and its functions. 
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[4] Code Morphing software transforms x86 binary code into individual instructions, 
called atoms, for the Crusoe processors. The compiler within the new software then 
looks for atoms that can be executed together and groups them into very long instruc- 
tion words called molecules. Molecules may contain two or four atoms. In cases where 
an atom is to be executed alone, or only three atoms are to be executed together, 
the second or fourth position is filled by a no-operation instruction. 



Laird, Hunter, or Ditzel had known for 
years, starting with Godfrey D'Souza, a Sun 
engineer who would have been in the 
founding group had he been in a financial 
position to work without a salary. In 1996, 
some 80 more engineers were added, mostly 
mid-career engineers who had years of 
experience in the jobs they were to take on 
for Transmeta. 

_Signing_o.tLSQ.many experienced engL- 
neers so fast in Silicon Valleys tight job mar- 
ket turned out to be surprisingly easy. 

"My being old helped," Laird said. (He is 
44.) "I've been around a long time,- I know a 
lot of people ." 

Ditzel also had a lot of contacts. "I had 
worked at Bell Labs," he told Spectrum, "and 
when you work there, you tend to get 
invited to lots of places to see their secret 
projects. I had been doing a lot of work for 
IEEE and ACM [Association for Com- 
puting Machinery] on conferences, and I 
had gone to school with people who had 
gone on to be professors at universities. So 
I was able to just pick up the phone and 
call the right people." 

When Ditzel and Laird made such calls, 
they provided little information to their 
prospective hires — just that they had a new 
company and were doing something really 
cool and new in computer architecture. 
After they were sure the person was inter- 
ested — and was the right fit — they brought 
out a nondisclosure agreement. Only after 
it was signed did they reveal any details 
about their plans. 

The experience of Guillermo Rozas was 
typical. Rozas, a software engineer and now 
Transmeta's director of product develop- 
ment, was at Hewlett-Packard Laboratories, 
in Palo Alto, in 1 997 when he heard from a 
close friend who had signed on with 
Transmeta. As Rozas explained, "He was a 
really smart guy, and he told me there were 
really smart people here that would be fun 
to work with. I didn't know all that much 
more when I came in, other than a lot of 
people I had known had mysteriously dis- 
appeared inside Transmeta." 

Also recruited was Stephen Herrod, now 
director of software productization, who 
was at Stanford University, California, 
before joining Transmeta. He had done his 
Ph.D. dissertation on runtime code gener- 
ation, citing a number of papers and 
researchers in the field. "When I searched 
out where all those people were now, it 
turned out that all of them were at 
Transmeta," he told Spectrum, "I did know 
someone here from conferences, so I called 
him up and asked if I could come in. I was 
about the 1 5th software person hired, and 
the other 1 4 were largely the people whose 
work I had been studying." 

In late 1996, after some hundred people 
were on board, Laird decided it was time to 
hire a few engineers right out of college. 
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"You need a good distribution of experi- 
ence/' he said. "If you have all senior level 
people, and there are a lot of details that 
need to be taken care of, they are not 
going to want to do that." He and Ditzel 
called their professor friends and asked for 
their best students, eventually hiring 
around 30 graduates. A number of these 
students were interviewed without even 
...knowing, what Transmeta did, onJyJthat„ 
their advisor had told them that Transmeta 
was a hot start-up. 

Despite the large numbers of engineers 
that were being hired from Silicon Valley's 
top companies-Hewlett-Packard, MIPS 
Technologies, Silicon Graphics (but not 
Intel) — little information about Transmeta's 
work was being leaked. 

: "Our approach was simple: to use soft- 
ware as a key piece of the microprocessor," 
Ditzel said. "So if that one simple idea 
leaked out, our competitors could get a pro- 
ject going. If it didn't, then they couldn't 
have a competitive product out in five 
weeks — it would take them five years." 

They kept the secret virtually leak- free 
by what Ditzel calls rifle-shooting. "Leaks 
come from people you interview and don't 
hire. But if you rifle-shoot the exact peo- 
ple you want, all you have to do is impress 
them about what you're doing and hire 
them.. Then once they've joined your com- 
pany,; they won't leak." He says some 90 
percent of engineers offered jobs by 
Transmeta accepted. 

"People were excited about this project 
because it was one of the first really differ- 
ent types of computer systems that had been 
designed in the past several years," Ditzel 
told Spectrum. "The hardware guys loved it 
because they could start with a blank sheet 
of paper, they didn't have to be compati- 
ble with an old instruction set. The software 
guys liked it because they could ask the 
hardware guys for special features." 

Because the company was hiring so 
many senior people, the decision was made 
in the beginning that, even though funds 
were tight, every engineer would have a 
private office (as soon as they were avail- 
able — some employees did double up tem- 
porarily). Other amenities include a well- 
stocked kitchen with drinks, sandwich 
makings, and snacks. Dinner is ordered in 
four nights a week. 

The atmosphere is as open as a college 
campus (complete with a busy foosball 
table) — perhaps even more so. Said Keith 
Klayman, a member of the technical staff: 
"Like at a university, we can go to anyone 
here if we have a question. But at the uni- 
versity, the professor was in maybe once a 
week. Here, the high-level people are 
always around and accessible." 

Every engineer also has at home a com- 
pany-provided computer that connects to 
the Internet through a high-speed digital 



subscriber line (DSL). With this equip- 
ment, people with families can go home 
for dinner, get back to their engineering 
work around 10 p.m., and then sleep late 
in the morning. One winter the company 
even rented a cabin in the Lake Tahoe ski 
area and equipped it with computers and 
DSL capability, so engineers could get their 
winter skiing in without losing time from 

theirjupjects. 

The lack of borders between hardware 
and software engineers at Transmeta is, 
employees report, unique in their experi- 
ence. Whenever a technical problem is dis- 
cussed, both hardware and software engi- 
neers team up to address it. Sometimes a 
problem faced by the software engineers 
is made solvable by a change in the hard- 
ware,- sometimes it goes the other way. As 
a result, the company's fleet of rattletrap 
bicycles, used by the engineers to travel 
between the buildings housing the two 
teams, get a lot of use [Fig. 5]. 

HOUSTON, 
WE HAVE A PROBLEM... 

After three years of work, in August 
1 998, the first chips came back from IBM 
Corp., which had signed on as manufac- 
turer. To check out the performance of the 
chips, the Transmeta engineers ran several 
benchmarks, both for Unix and Windows. 
The chips ran Unix benchmarks as fast as 
had been expected; the first magic trick 
had worked. 

But when the engineers assigned to per- 
formance analysis started testing Windows 
benchmarks, they had a nasty surprise. The 
Windows benchmarks reported scores far 
lower than expected. Transmeta had reached 
into its magic hat to pull out a rabbit and had 
instead come up with a turtle. 

"It was like in the Apollo i 3 movie," Laird 
said, "We wanted to say, Whoops, Houston, 
we've got a problem here.' " 

Laird was philosophical about the situa- 
tion. 'We're engineers," he told Spectrum. 'We 
didn't need to panic. We needed to under- 
stand what was going on. And so we ana- 
lyzed it, moved teams of hardware and soft- 
ware people onto it, and started fixing it." 

But not all the engineers at Transmeta 
were so sanguine. 

"We had been riding high, blindly ex- 
pecting the chips to do everything that we 
had promised," recalled Klayman. "When 
they didn't, it was a real morale killer." Some 
of them felt it was never going to work, and 
since nobody was motivated, no work was 
getting done. Then Doug Laird told them 
to drop everything else they were doing, as 
there was still a chance to right the ship. 

The company held an all-hands meeting, 
in which Laird told everyone the truth — 
that they had run into a wall running 
Windows benchmarks. But he reassured 
them that, by working together, they could 



fix the problem. Murray Goldman, a mem- 
ber of the board of directors, pledged that 
the board would stand by their efforts, 
implying that more money would be raised, 
should it be needed. 

Looking back, Laird said a problem 
might have been expected with Windows95 
applications. "Most of us came from a Unix 
background, we knew how Unix applica- 
- tions.behaved. But we didn't really under- 
stand Windows95 ( " he said. 

Apparently Windows95 still had a lot of 
old 16-bit code in it, whereas Unix (as well 
as Windows NT) used a flat memory 
model with pure 32-bit code. Supporting 
1 6-bit code was something that Transmeta 
had decided to offload into software. 

Once they realized this, they redesigned 
the hardware to give better support to 
Windows95 applications. They also in- 
creased the size of the caches because 
Windows95 applications tend to use more 
memory than Unix applications. 

The redesign process added about a year 
to Transmeta's development time. In fact, 
getting products to market took longer than 
any of the founders had anticipated. "If we 
had had a better idea of how long it would 
have taken, we probably would not have 
done it, I suspect," said D'Souza. 

TO MARKET, TO MARKET 

While the engineers were struggling to 
redesign the chip to run Windows appli- 
cations at a reasonable speed, a marketing 
team was taking the show on the road, 
showing off their concept to OEMs, and 
asking them if Transmeta was making chips 
that would sell, and, if so, into what market. 

The feedback from the OEMs was almost 
unanimous, Ditzel said. While they had been 
presenting their product as appropriate for 
both the desktop and mobile markets, cus- 
tomers disliked the split focus. They wanted 
chips optimized for mobile computing. 

"Customers told us consistently," Ditzel 
said, "that they had pretty good chips for 
desktops and servers, but the road ahead for 
mobile chips looked horrible, there was 
nothing coming out that was usable. So, 
they told us, if you are going to build us a 
chip, go build us a mobile chip." 

The most important parameter for the 
mobile market is a chip's power consump- 
tion. Ditzel said he and Laird had always 
thought that the hardware/software archi- 
tecture had a lot of potential for reducing 
a chip's power consumption, and in gen- 
eral the team designed the chip's circuits 
with low power in mind. They had not 
pitched this feature to venture capitalists, 
because, Ditzel said, it was impossible to 
know how significant the drop in power 
was going to be. 

By late 1998, with the initial market 
research complete and prototype chips on 
which to measure power consumption in 
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[5] Transmeta engineers built 
a PC-compatible board so 
they could run Windows 
applications on their chips. 
The board is held at left by 
Doug Laird, vice president of 
product development 

Transmeta's hardware and 
software teams are housed in 
two buildings, about 1 km_ 
apart in an office park. De- 
signing chips with so many 
software functions requires 
interdisciplinary teams, so 
Transmeta's fleet of second- 
hand bicydes gets a lot of use 
in cross-campus commutes — 
even by chief executive offi- 
cer David Ditzel [below]. 
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hand, the decision to focus on mobile com- 
puting was made, and power consumption 
issues came to the forefront. 

POWERING DOWN 

"A number of people have said that 
designing lower- power chips means doing 
a lot of little things — a little bit here, a lit- 
tle bit there," Laird told Spectrum. "And if you 
do a lot of it, the sum of it is good." 

One of the biggest little things that the 
Transmeta team did was to offload a good 
bit of the microprocessor function onto the 
software, which allowed them to design sim- 
ple streamlined hardware with about half 
the number of transistors of an x86 chip. 
"Obviously," continued Laird, "if you have 
fewer transistors, you burn less power." 

The team also used virtual devices to cut 
down on the amount of hardware, A virtual 
device is one that is not exactly the same as 
the device expected by the program, but 
produces the same result. It works by using 
the Code Morphing software to monitor 
the input and output instructions to the 
device, then to send those instructions to 
the virtual device instead. For example, 
Crusoe incorporates, on-chip, a separate IC 
called the north bridge, which couples the 
processor to the peripheral component 
interface bus and to external memory. 

The north bridge features architecturally 
defined registers, to which the program 
sends input and output instructions. To be 
compatible with the architecture for which 
the instructions are written, those registers 
must be constructed so that any application, 
or the operating system, can manipulate 
them correctly. 

But rather than implementing those reg- 



isters exactly as in a conven- 
tional north bridge, Transmeta 
engineers employed the Code 
Morphing software to intercept 
the instruction to the north 
bridge registers and send it 
instead to the registers defined 
in the Crusoe architecture. 
Ditzel predicts that the team 
will be virtualizing more circuits 
as time goes on. 

Another technique is to turn 
on only those functional units that are 
absolutely needed to execute an instruc- 
tion. The process requires a separate clock 
for each combination of functional units 
that is turned on during the execution of 
an instruction. This approach was carried 
out so thoroughly that a vendor supplying 
a computer-aided design simulation tool 
complained that the Transmeta design 
"broke his tool" because the processor had 
over 1 0 000 clocks to control which units 
get turned on, and when. 

But the biggest breakthrough in low- 
power design came with the development 
of the so-called LongRun technology, which 
uses the Code Morphing software to mon- 
itor applications as they are running. Then 
LongRun hardware adjusts both the supply 
voltage and the clock frequency so that each 
application runs only as fast as it must to get 
the job done. Since the processor is running 
at maximum efficiency, it is maximizing bat- 
tery life. 

Traditional power-management systems 
also adjust power, but are much less refined. 
They often try to extend battery life by 
varying the duty cycle, repeatedly turning 
the central processing unit on for a fraction 




of a second, then off for a fraction of a sec- 
ond. "Imagine that you wanted to make the 
light in a room half as bright," explained 
Marc Fleischmann, manager of the LongRun 
power management team. "It would seem 
silly to do that by flipping the light switch 
on and off rapidly. But that's exactly how 
power management works on traditional 
notebook computers." 

Rather than a light switch, Fleischmann 
compares LongRun to a dimmer control. 
While applications are running, Trans- 
meta's software observes the traditional 
power management states and the time 
spent in the sleep mode,- then on-chip 
LongRun circuitry reduces the frequency 
and the voltage to precisely match just 
what the user needs. 

"If you spend 40 percent of your time in 
sleep mode, that means you only need to 
run at 60 percent of the performance level. 
So we reduce the frequency from 700 MHz 
to about 400 MHz, say. And we ramp down 
the voltage correspondingly. Adjusting 
both frequency and voltage is a far more 
efficient way to extend battery life," 
Fleischmann told Spectrum. 

"The major point," added Laird, "is that 
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LongRun is an extension of power man- 
agement, not a substitution for it." 

All told, the efforts to reduce power 
consumption on the Crusoe chips can re- 
duce power by a factor between three and 
30, depending on the application, com- 
pared with a typical x86 processor, accord- 
ing to Fleischmann. 

SOFTWARE'S EDGE 

As the design of the microprocessor 
evolved, other advantages of moving func- 
tions into software became apparent. 
"Having software involved gave us more 
opportunities than we initially thought," 
Ditzel said. 

Processor upgrades are simplified because 
the layer of software between the applica- 
tions and the chip frees the designers to 
change the chip architecture without caus- 
ing x86 software developers to have to 
recompile their code. Code Morphing soft- 
ware can be updated independently of hard- 
ware by loading a software upgrade into 
Flash memory. 

The software also helps the debugging 
process. When the hardware design team 
got the very first silicon, they found plenty 
of bugs. They knew that the software layer 
would help them debug the chip, but no one 
appreciated ahead of time just how pow- 
erful that help would be, according to 
D'Souza. They were able to work around 
a lot of the bugs, he said, by performing 
operations in a different way. 

The engineers were always able to boot 
Windows, even on buggy silicon. As each 
bug was found (and fixed with software), 
it was added to the list of revisions for the 
next design. 

What's more, the software layer was also 
used to increase performance by improving 
the timing of critical paths. For instance, 
engineers found that when two particular 
atoms were paired together in a molecule, 
the processor ran sluggishly. Otherwise, the 
chip could run at a much faster clip. So the 
hardware designers asked the software 
designers to modify the scheduler so that 
these two atoms would not appear in the 
same molecule. "All of a sudden," said 
D'Souza, "we were running at 600 MHz 
instead of 466 MHz." 

CRUSOE LIVES 

By August 1999, the first of the re- 
designed chips came back from the IBM fab. 
This time, it ran Windows applications just 
fine. This chip, for the mobile computing 
market, became the TM5400. The origi- 
nal design, which was intended for running 
Linux for the Internet appliance market, 
became the TM3120. 

The TM5400 is similar to the TM3 120, 
but has added the LongRun feature to con- 
serve power. This chip also has more on- 
chip cache memory than the TM3120 to 
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support x86 applications for Windows- 
based notebook computers. The TM3 1 20 
runs at 400 MHz, while the TM5400 runs 
at up to 700 MHz. 

Transmeta engineers intentionally de- 
signed Crusoe to be simpler than conven- 
tional x86s slated for mobile applications, 
but to achieve comparable performance by 
running at a higher frequency. The fastest 
mob ile Pentium 111 clocks in at 650 MHz. 

Of course, the performance of the 
Crusoe chips depends on the application. 
"1 think it's fair to say that Crusoe is faster 
on some applications and not as fast on 
others," said Ditzel. 

For most mobile applications, all of the 
TM5400's processing power is often not 
even needed. The effectiveness of LongRun 
lies in making the processor run at just the 
right frequency to deliver the performance 
demanded by the application while con- 
serving power. 

The microprocessor family was for- 
mally branded Crusoe, after the fictional 
adventurer and traveler, Robinson Crusoe. 
"It was friendly, short, and easy to remem- 
ber," Ditzel said. "So you'll remember it's 
a mobile chip." 

Finally, on 19 January 2000, after nearly 
five years of effort and over $100 million 
invested, Transmeta pulled back its curtain 
at a large press conference at Villa Mon- 
talvo, a grand old estate in the hills of 
Saratoga, Calif. 

Meanwhile, engineers at one of Trans- 
meta's unmarked buildings raised a huge 
black flag with the yellow Crusoe logo from 
the roof of their building. The flag could be 
seen by Intel engineers driving to and from 
their nearby offices. 

Bennett Smith, a consultant in micro- 
architecture, computing platforms, and 
related intellectual property, is impressed 
by Transmeta's technology. 'They have a 
sophisticated approach to power consump- 
tion that looks pretty amazing," he told 
Spectrum. On the negative side, he has heard 
concerns that the company's chips are just 
too expensive. "Companies designing for the 
portable market may have difficulty justi- 
fying the intellectual property premiums 
built into Transmeta's business plan," he said 
Smith and Bruce Shriver are co-authors of 
The Anatomy of a High-Perjormance Microprocessor: 
A Systems Perspective (IEEE Computer Society 
Press, Los Alamitps, Calif., 1998). 

Writing in Cahners Microprocessor Report, 
14 February 2000, Tom R. Halfhill also ex- 
pressed cautious praise: "Revolutionary 
may be an overstatement, but they are 
definitely different. . ..The TM5400's Long- 
Run feature is one of the most innovative 
technologies introduced by Transmeta. 
To our knowledge, no other micropro- 
cessors can conserve power by scaling its 
voltage and clock frequency in response 
to the variable demands of software." 



Indeed, the chips still continue to amaze 
their creators. 

Referring to the prototype system that 
the Transmeta team used to test the Crusoe 
chips, Rozas said, "I've been seeing these 
things run now for a year and a half. I know 
them inside out. Yet, I am still amazed every 
time I start it up and [a Crusoe-chip com- 
puter] looks like a normal PC. 

^Considering the cpmpLexity_oLth.e._ 
project, it is amazing how well it works, 
how fast it works, and how low- power it 
is," Fleischmann commented. "For the 
end-user, this is just a normal PC, but 
under the hood, it is a technological mar- 
vel. I am in a state of wonder, too— and 
I am proud." 

THE NEXT GENERATION 

Variations of the current generation 
(both low-cost versions and higher-perfor- 
mance versions) are also being designed. 
(The part numbers were purposely picked 
to be in. the. middle of the range, leaving 
room for both new versions.) 

Transmeta's next generation may have a 
fundamentally different architecture, even 
a different instruction set — whatever it takes 
to make it better, because use of Code 
Morphing software obviates the need for 
legacy hardware. 

The design will most likely include the 
latest submicron CMOS technology, in- 
cluding shielded clock lines. The computer- 
aided design tools will need to make accu- 
rate models of inductive coupling between 
the interconnect structures on the chip. To 
the engineers, this is a chance, once again, 
to start with a blank sheet of paper and to 
rethink the first generation's tradeoffs 
between hardware and software. To the user, 
though, the next Crusoe will still appear 
as an x86. 

"Usually you say the next generation will 
be bigger and better," Ditzel said. "But in 
this case, 111 say it will be smaller and require 
even less power." ♦ 

TO PROBE FURTHER 

Information on Transmeta Corp., white papers 
describing in detail its Code Morphing tech- 
nology, videos of the Crusoe product launch 
event, recent news articles, and employment 
opportunities at the company are available at 
www.transmeta.com. 

Detailed analysis of the Crusoe processor 
architecture is to be found in the article 
"Transmeta breaks x86 low power barrier," by 
Tom R. Halfhill, Microprocessor Report, 
14 February 2000, p. 1 and pp. 9-18. 

Transmeta will be making presentations at the 
Embedded Processor Forum in June (see 
www.mdronline.com) and at the IEEE's Hot 
Chips meeting in August in California (see 
www.hotchips.org). 
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